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1  Research  activities 

The  main  results  of  the  research  activities  supported  by  the  Air  Force  Office  of  Scientific 
Research  (AFOSR)  were  described  in  great  detail  and  made  public  in  the  five  papers 
listed  below  and  appended  to  the  end  of  this  report  (see  contents). 


1 .  Jose  F.  Fontanari  and  Leonid  I.  Perlovsky,  '’'’Evolving  compositionality  in 
evolutionary’  language  games”,  IEEE  Transactions  on  Evolutionary  Computation, 
published  on-line  doi:10.1 109/TEVC.2007.892763 

2.  Jose  F.  Fontanari  and  Leonid  1.  Perlovsky,  "’Inverse  density  dependence  in  the 
evolution  of  communication” ,  submitted  to  Journal  of  Theoretical  Biology. 

3.  Jose  F.  Fontanari  and  Leonid  I.  Perlovsky,  "How  communication  can  improve 
differentiation  in  the  Modeling  Field  Theory  framework” ,  Proceedings  of  the  IEEE 
International  Conference  on  Integration  of  Knowledge  Intensive  Multi-Agent 
Systems  KIMAS07,  Waltham,  MA  ( ISBN:  1-4244-0945-4),  pp.  151-156  (2007) 

4.  Jose  F.  Fontanari  and  Leonid  1.  Perlovsky,  "Language  acquisition  and  category 
discrimination  in  the  Modeling  Field  Theory  framework” ,  Proceedings  of  the 
International  Joint  Conference  on  Neural  Networks  (IJCNN07),  Orlando,  FL. 

5.  Angelo  Cangelosi,  Vadim  Tikhanoff,  Jos6  F.  Fontanari  and  Emmanouil 
Hourdakis,  "Integrating  Language  and  Cognition:  A  Cognitive  Robotics 
Approach”,  invited  contribution  to  IEEE  Computational  Intelligence  Magazine. 

The  first  two  papers  address  the  main  topic  of  investigation  of  the  research  proposal.  In 
particular,  we  have  introduced  a  simple  structured  meaning-signal  mapping,  where  meaning 
and  signals  are  represented  by  integers  and  the  metrics  of  the  meaning  and  signal  spaces  are 
specified  by  the  simple  subtraction  operation.  Of  particular  relevance  is  our  finding  that 
structured  (or  compositional)  communication  codes  cannot  evolve  within  the  traditional 
language  evolutionary  game  setting:  the  evolutionary  dynamics  is  plagued  by  local  maxima 
that  do  not  reflect  the  inner  organization  of  the  meaning  and  signal  spaces.  In  the  paper 
"Evolving  compositionality  in  evolutionary  language  games”  we  have  proposed  an 
alternative  learning  scheme  in  which  the  individuals  or  agents  learn  the  signal-meaning 
associations  one  by  one  -  a  procedure  named  sequential  meaning  assimilation.  Provided  the 
meanings  are  presented  in  an  order  that  conforms  to  their  proximity  in  the  meaning  space, 
this  scheme  works  nicely  and  leads  to  the  emergence  of  structured  communication  codes. 
In  the  paper  "Inverse  density  dependence  in  the  evolution  of  communication”,  we  maintain 
the  parallel  or  simultaneous  presentation  of  all  meanings  but  allow  for  some  structure  in  the 
population,  so  that  individuals  adopting  similar  communication  codes  meet  more  frequently 
than  individuals  that  adopt  different  codes.  We  then  show  that  provided  the  aggregation 
pressure  is  sufficiently  strong,  structured  codes  are  likely  to  emerge  and  become  established 
in  the  population. 

The  last  three  papers  do  not  address  the  emergence  of  compositionality  issue  directly. 
Rather,  they  focus  on  a  more  basic  problem  -  the  selective  pressures  responsible  for  the 
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evolution  of  communication.  In  particular,  we  show  that  the  exchange  of  information 
between  Modeling  Field  Categorization  systems  (or  agents)  can  greatly  improve  the 
discriminating  capability  of  each  agent,  in  the  sense  they  become  capable  of  differentiating 
objects  or  categories  that  they  could  not  distinguish  without  language.  We  note  that  an 
extension  of  paper  4  was  accepted  for  publication  in  the  special  issue  “Advances  in  Neural 
Networks  Research  -  IJCNN  2007  Orlando”  of  the  influential  journal  Neural  Networks. 
Finally,  paper  5  is  a  product  of  the  research  effort  done  in  collaboration  with  the  group  lead 
by  Dr.  Angelo  Cangelosi  at  the  University  of  Plymouth,  to  implement  the  abstract 
framework  described  in  papers  3  and  4  in  a  robotics  scenario. 
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Evolving  Compositionality  in 
Evolutionary  Language  Games 

Jose  Fernando  Fontanari  and  Leonid  I.  Perlovsky,  Senior  Member,  IEEE 


Abstract — Evolutionary  language  games  have  proved  a  useful 
tool  to  study  the  evolution  of  communication  codes  in  communities 
of  agents  that  interact  among  themselves  by  transmitting  and  inter¬ 
preting  a  fixed  repertoire  of  signals.  Most  studies  have  focused  on 
the  emergence  of  Saussurean  codes  (i.e,,  codes  characterized  by  an 
arbitrary  one-to-one  correspondence  between  meanings  and  sig¬ 
nals).  In  this  contribution,  we  argue  that  the  standard  evolutionary 
language  game  framework  cannot  explain  the  emergence  of  com¬ 
positional  codes— communication  codes  that  preserve  neighbor¬ 
hood  relationships  by  mapping  similar  signals  into  similar  mean¬ 
ings— even  though  use  of  those  codes  would  result  in  a  much  higher 
payoff  in  the  case  that  signals  are  noisy.  We  introduce  an  alter¬ 
native  evolutionary  setting  in  which  the  meanings  are  assimilated 
sequentially  and  show  that  the  gradual  building  of  the  meaning- 
signal  mapping  leads  to  the  emergence  of  mappings  with  the  de¬ 
sired  compositional  property. 

Index  Terms — Complexity  theory,  game  theory,  genetic  algo¬ 
rithms,  simulation. 


1.  Introduction 

The  case  for  the  study  of  the  evolution  of  communi¬ 
cation  within  a  multiagent  framework  was  probably  best 
made  by  Ferdinand  de  Saussure  in  his  famous  statement: 

“language  is  not  complete  in  any  speaker;  it  exists  only 
within  a  collectivity. . .  only  by  virtue  of  a  sort  of  contract 
signed  by  members  of  a  community”  [1]. 

Translated  into  the  biological  jargon,  this  assertion  means  that 
language  is  not  the  property  of  an  individual,  but  the  extended 
phenotype  of  a  population  [2].  More  than  one  decade  ago, 
seminal  computer  simulations  were  carried  out  to  demonstrate 
that  cultural  [3]  as  well  as  genetic  [4]  evolution  could  lead  to 
the  emergence  of  ideal  communication  codes  (i.e.,  arbitrary 
one-to-one  correspondences  between  objects  or  meanings  and 
.signals),  termed  Saussurean  codes,  in  a  population  of  inter¬ 
acting  agents.  Typically,  the  behavior  pattern  of  the  agents  was 
modeled  by  (probabilistic)  finite-state  machines.  The  work  by 
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Hurford  [3],  in  particular,  set  the  basis  of  the  Iterated  Learning 
Model  (ILM)  for  the  cultural  evolution  of  language,  the  typical 
realization  of  which  consists  of  the  interaction  between  two 
agents — a  pupil  that  learns  the  language  from  a  teacher  [5].  In 
those  studies,  language  is  viewed  as  a  mapping  between  mean¬ 
ings  and  signals.  The  communication  codes  that  emerged  from 
the  agents’  interactions  are,  in  general,  noncompositional  or 
holistic  communication  codes,  in  which  a  signal  stands  for  the 
meaning  as  a  whole.  In  contrast,  a  compositional  language  is 
a  mapping  that  preserves  neighborhood  relationships — similar 
signals  are  mapped  into  similar  meanings.  If  there  is  a  nontrivial 
structure  in  both  meaning  and  signal  spaces  then,  in  certain 
circumstances,  making  explicit  use  of  those  structures  may 
greatly  improve  the  communication  accuracy  of  the  agents.  The 
emergence  of  compositional  languages  in  the  ILM  framework 
beginning  from  holistic  ones  in  the  presence  of  bottlenecks 
on  cultural  transmission  was  considered  a  breakthrough  in  the 
computational  language  evolution  field  [5]-[7].  The  aim  of 
this  contribution  is  to  understand  how  compositional  commu¬ 
nication  codes  can  emerge  in  an  evolutionary  language  game 
framework  [3],  [4],  [8],  [9]. 

The  way  we  introduce  the  structure  of  the  signal  .space  (i.e., 
the  notion  of  similarity  between  signals)  into  the  rules  of  the  lan¬ 
guage  game  is  through  errors  in  perception:  the  signals  are  as¬ 
sumed  to  be  corrupted  by  noise  so  that  they  can  be  mistaken  for 
one  of  their  neighbors  in  signal  space  [8].  Similarly,  the  structure 
of  the  meaning  space  enters  the  game  by  rewarding  the  agents 
that  prompted  by  a  signal,  infer  a  meaning  close  to  the  meaning 
actually  intended  by  the  emitter.  Of  course,  the  reward  for  in¬ 
correct  but  close  inferences  must  be  smaller  than  that  granted 
for  the  correct  inference  of  the  intended  meaning  (see  [9]  for  a 
similar  approach).  Hence,  the  role  played  by  noise  in  this  con¬ 
text  is  similar  to  the  role  of  the  bottleneck  transmissions  in  the 
ILM  framework,  since  both  make  advantageous  the  exploration 
of  the  detailed  structure  of  the  meaning-signal  mapping.  In  par¬ 
ticular,  we  show  that  once  a  Saussurean  communication  code  is 
established  in  the  population,  i.e.,  all  agents  use  the  same  code, 
it  is  impossible  for  a  mutant  to  invade,  even  if  the  mutant  uses 
a  better  code,  say,  a  compositional  one.  This  is  essentially  the 
Allee  effect  [10],  [11]  of  population  dynamics  that  a.sserts  that 
intraspecific  cooperation  might  lead  to  inverse  density  depen¬ 
dence,  resulting  in  the  extinction  of  some  (social)  animal  species 
when  their  population  size  becomes  small.  Of  course,  this  ef¬ 
fect  is  germane  to  the  outcome  of  biological  invasions  involving 
such  species.  We  note  that  most  realizations  of  the  ILM  circum¬ 
vent  this  difficulty  by  assuming  that  the  population  is  composed 
of  two  agents  only,  the  teacher  and  the  pupil,  and  that  the  latter 
always  replaces  the  former.  However,  according  to  de  Saussure 
(see  quotation  above),  this  is  not  an  acceptable  framework  for 


I()89.778X/$25.00  ©  2007  IEEE 


This  article  has  been  accepted  f  vlusion  in  a  future  issue  of  this  journal.  Conieni  is  final  as  j  ^^ed,  wiih  ihe  exception  of  pagination. 


IEEE  TRANSACTIONS  ON  EVOLUTIONARY  COMPirTATION 


language.  In  addition,  a  bias  toward  compositionality  is  built 
in  the  inference  procedure  used  by  the  pupil  to  fill  in  the  gaps 
due  to  transmission  bottlenecks,  in  which  some  of  the  meanings 
are  not  taught  to  the  pupil.  This  bias  towards  generalization,  to¬ 
gether  with  cultural  evolution,  seems  to  be  the  key  ingredients 
to  evolve  compositionality  in  the  ILM  framework. 

Understanding  as  well  as  demonstrating  how  innovations  that 
increase  the  expressive  power  of  individuals  can  spread  through 
a  population  is  the  essence  of  any  evolutionary  explanation  to 
language  evolution  [9].  Accordingly,  the  solution  we  propose  to 
the  problem  of  evolving  a  compositional  code  in  a  population  of 
agents  that  exchange  signals  with  each  other  and  receive  rewards 
at  every  successful  communication  event  is  the  incremental  as¬ 
similation  of  meanings,  i.e.,  the  agents  construct  their  commu¬ 
nication  codes  gradually,  by  seeking  a  consensus  signal  for  a 
single  meaning  at  a  given  moment.  Only  after  a  consensus  is 
reached,  a  novel  meaning  is  permitted  to  enter  the  game.  This  se¬ 
quential  procedure,  which  dovetails  with  the  classic  Darwinian 
explanation  to  the  evolution  of  strongly  coordinated  system,  al¬ 
lows  for  the  emergence  of  fully  compositional  codes,  an  out¬ 
come  that  we  argue  is  very  unlikely,  if  not  impossible,  in  the  tra¬ 
ditional  language  game  scenario  in  which  the  consensus  signals 
are  sought  simultaneously  for  the  entire  repertoire  of  meanings. 

II.  Model 

Here,  we  take  the  more  conservative  viewpoint  that  language 
evolved  from  animal  communication  as  a  means  of  exchanging 
relevant  information  between  individuals  rather  than  as  a 
byproduct  of  animal  cognition  or  representation  systems  (see, 
e.g.,  [12]  and  [13]  for  the  opposite  viewpoint).  In  particular, 
we  consider  a  population  composed  of  N  agents  who  make 
use  of  a  repertoire  of  m  signals  to  exchange  information 
about  n  objects.  Actually,  since  the  groundbreaking  work  of 
de  Saussure  [1],  it  is  known  that  signals  refer  to  real-world 
objects  only  indirectly  as  first  the  sense  perceptions  are  mapped 
onto  a  conceptual  representation — the  meaning — and  then  this 
conceptual  representation  is  mapped  onto  a  linguistic  represen¬ 
tation — the  signal.  Here,  we  simply  ignore  the  object-meaning 
mapping  (see,  however,  [14]  and  [15])  and  use  the  words 
object  and  meaning  interchangeably.  To  model  the  interaction 
between  the  agents,  we  borrow  the  language  game  framework 
proposed  by  Hurford  [3]  (see  also  [8])  and  assume  that  each 
agent  is  endowed  with  separate  mechanisms  for  transmission 
(i.e.,  communication)  and  for  reception  (i.e.,  interpretation). 
More  pointedly,  for  each  agent  we  define  a  n  x  m  transmission 
matrix  P  whose  entries  yield  the  probability  that  object  i 
is  associated  with  signal  j,  and  a  rn  x  n  reception  matrix  Q 
the  entries  of  which,  qji,  denote  the  probability  that  signal  j  is 
interpreted  as  object  i.  Henceforth,  we  refer  to  P  and  Q  as  the 
language  matrices.  In  general,  the  entries  of  these  two  matrices 
can  take  on  any  value  in  the  range  [0,1]  satisfying  the  con¬ 
straints  Yl'jlziPij  “  1  Qji  —  conformity  with 

their  probabilistic  interpretation.  In  this  contribution,  however, 
we  consider  the  case  of  binary  matrices,  in  which  the  entries 
of  Q  and  P  can  assume  the  values  0  and  1  only.  There  are 
two  reasons  for  that.  First,  in  the  absence  of  errors  in  language 
learning,  the  evolutionary  language  game  will  eventually  lead 
to  binary  transmission  and  reception  matrices,  regardless  of 


the  values  of  m  and  n,  and  of  the  initial  choice  for  the  entries 
of  those  matrices  [16].  Therefore,  our  restriction  of  the  entry 
values  to  binary  quantities  has  no  effect  on  the  equilibrium 
solutions  of  the  evolutionary  game.  In  addition,  these  deter¬ 
ministic  encoders  and  decoders  were  shown  to  perform  better 
than  their  stochastic  variants  [17].  Second,  by  assuming  that 
the  transmission  and  reception  matrices  are  binary,  we  recover 
the  synthetic  ethology  framework  proposed  by  MacLennan  [4], 
a  seminal  agent-based  work  on  the  evolution  of  communication 
in  a  population  of  finite  state  machines  (see  also  [18]). 

Although  the  reception  matrix  Q  is,  in  principle,  independent 
of  the  transmission  matrix  P,  results  of  early  computer  simu¬ 
lations  have  shown  that  in  a  noiseless  environment,  the  optimal 
communication  strategy  is  the  Saussurean  two-way  arbitrary  re¬ 
lationship  between  an  object  and  a  signal,  i.e.,  the  matrices  P 
and  Q  are  linked  such  that  if  pij  =  1  for  some  object-signal 
pair  z,  y,  then  qji  =  1  [3].  These  matrices  are  associated  to  the 
Saussurean  communication  codes  introduced  before,  provided 
there  are  no  correlations  between  the  different  rows  of  the  ma¬ 
trix  P,  i.e.,  the  assignment  object-signal  is  arbitrary. 


A.  The  Evolutionary  Language  Game 

Given  the  transmission  and  reception  matrices,  the  commu¬ 
nicative  accuracy  or  overall  payoff  for  communication  between 
two  agents,  say  I  and  .7,  is  defined  as  [3],  [8],  [19] 


i=l  j=l 


{p^ii 


’ijl 


(1) 


from  which  we  can  observe  the  symmetry  of  the  language  game, 
i.e.,  both  signaler  and  receiver  are  rewarded  whenever  a  suc¬ 
cessful  communication  event  takes  place.  By  assuming  such  a 
symmetry,  one  ignores  a  serious  hindrance  to  the  evolution  of 
language:  passing  useful  information  to  another  agent  is  an  al¬ 
truistic  behavior  [20],  [21]  that  can  be  maintained  in  human 
societies  thanks  to  the  development  of  reciprocal  altruism,  in 
which  unrelated  individuals  mutually  benefit  by  exchanging  the 
donor  and  the  receiver  roles  multiple  times  [22].  However,  the 
scarcity  of  empirical  demonstrations  of  reciprocal  altruism  in 
nature,  except  for  modem  humans,  motivated  an  alternative  sce¬ 
nario  for  the  evolution  of  language,  namely,  that  human  lan¬ 
guage  evolved  as  a  “mother  tongue” — a  communication  system 
used  among  kin,  especially  between  parents  and  their  offspring 

[23] . 

In  this  contribution,  we  assume  the  validity  of  ( 1 )  and  simply 
ignore  the  costs  of  honest  signaling  [20].  Hence,  we  take  for 
granted  the  existence  of  special  social  conditions  to  foster  re¬ 
ciprocal  altruism  among  the  agents  or,  alternatively,  a  mother 
tongue  scenario  in  which  the  agents  are  related  to  each  other.  In 
this  vein,  it  is  interesting  to  note  that  although  in  the  work  by 
MacLennan  [3]  communication  is  defined  following  Burghardt 

[24]  as  “the  phenomenon  of  one  organism  producing  a  signal 
that  when  responded  to  by  another  organism,  confers  some  ad¬ 
vantage  to  the  signaler  or  his  group”  (see  [25]  for  alternative 
definitions  of  communication),  the  actual  implementation  of  the 
simulation  rewards  equally  the  two  agents  that  take  part  in  the 
successful  communication  event.  In  the  case  where  only  the  re¬ 
ceiver  is  rewarded,  Saussurean  communication  fails  to  evolve 
[26]. 
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Assuming,  in  addition,  that  each  agent  I  interacts  with  every 
other  agent  J  =  1, . . . ,  iV  (J  ^  /)  in  the  population,  we  can 
immediately  write  down  the  total  payoff  received  by  I 


in  which  the  sole  purpose  of  the  normalization  factor  is  to  elim¬ 
inate  the  trivial  dependence  of  the  payoff  measure  on  the  pop¬ 
ulation  size  N .  Following  the  basic  assumption  of  evolutionary 
game  theory  [27]  this  quantity  is  interpreted  as  the  fitness  of 
agent  7.  Explicitly,  we  assume  that  the  probability  that  I  con¬ 
tributes  with  an  offspring  to  the  next  generation  is  given  by  the 
relative  fitness 


Wi 


Fi/Y.Fj 

J 


(3) 


which  essentially  implies  that  mastery  of  a  public  communica¬ 
tion  system  adds  to  the  reproductive  potential  of  the  agents  [3]. 

There  are  several  distinct  ways  to  implement  the  lan¬ 
guage  game.  For  instance,  MacLennan  [4]  and  Fontanari  and 
Perlovsky  [18]  stick  to  the  genetic  algorithm  approach  (see, 
e.g.,  [28])  in  which  the  offspring  acquires  both  the  transmission 
and  reception  matrices  from  its  parent,  assuming  clonal  or 
asexual  reproduction.  The  offspring  is  identical  to  its  parent 
except  for  the  possibility  of  mutations  that  may  alter  a  few  rows 
of  the  language  matrices.  However,  here  we  take  a  different 
viewpoint  and  reinterpret  this  genetic  model  within  a  learning 
context.  We  a.ssume,  in  particular,  that  the  offspring  actually 
learns  the  language  from  its  parent  but  that  the  learning  is  not 
perfect — there  is  a  probability  fi  that  the  communication  code 
it  acquires  is  slightly  different  from  its  parent’s.  This  very 
framework  has  been  used  to  study  the  emergence  of  universal 
grammar  and  syntax  in  language  [2],  [29],  [30]. 

An  alternative  learning  scenario  used  by  Nowak  and 
Krakauer  [8]  assumes  that  the  offspring  adopt  the  language 
of  its  parent  by  sampling  its  response  to  every  object  k  times. 
This  approach  makes  sense  only  if  the  language  matrices  are 
not  binary,  though,  as  mentioned  before,  in  the  long  run  those 
matrices  must  become  binary.  For  k  oc,  the  offspring  is 
identical  to  its  parent,  which  corresponds  then  to  i^i  =  0  in  the 
previous  learning  scenario,  whereas  differences  between  parent 
and  offspring  arise  in  the  case  of  finite  A:  >  1.  This  sampling 
effect  is  qualitatively  similar  to  the  effect  of  learning  errors 
in  the  scenario  introduced  before.  For  A:  =  1,  already  the  first 
generation  of  offspring  communicates  through  binary  language 
matrices  and  so  the  sampling  procedure  is  rendered  ineffective. 
The  reason  is  that  a  binary  matrix  P  assigns  each  object  to  a 
unique  signal  (though  this  same  signal  can  be  used  also  for  a 
distinct  object),  and  so  sampling  the  responses  of  the  parent 
to  the  same  object  will  always  yield  the  same  signal.  As  a 
result,  the  evolutionary  process  based  on  learning  by  sampling 
halts — the  offspring  become  identical  to  their  parents. 

A  similar  but  more  culturally  inclined  approach  is  that  fol¬ 
lowed  by  Hurford  [3]  and  Nowak  et  al.  [16]:  instead  of  sampling 
the  parent’s  responses,  the  offspring  samples  the  responses  of  a 
certain  number  of  agents  in  the  population  or  even  of  the  en¬ 
tire  population.  In  this  case,  the  hereditary  component  is  lost 
since  the  offspring,  in  general,  will  not  resemble  its  parent,  and 


so  natural  selection  has  no  say  in  the  outcome  of  the  dynamics. 
In  the  case  of  Hurford  [3],  there  is  still  a  strong  genetic  com¬ 
ponent  as  the  offspring  inherits  from  its  parent  its  strategy  of 
inference.  Similarly,  the  ILM  for  the  cultural  evolution  of  lan¬ 
guage  (see  [5]  and  [7]  for  reviews)  in  its  more  popular  version 
consists  of  two  agents  only,  the  teacher  and  the  pupil  who  learns 
from  the  teacher  through  a  sampling  process  identical  to  that 
just  described.  The  pupil  then  replaces  the  teacher  and  a  new, 
tabula  rasa  pupil  is  introduced  in  the  scenario.  This  procedure 
is  iterated  until  convergence  is  achieved.  In  this  case,  the  payoff 
(2)  plays  no  role  at  all  in  the  language  evolutionary  process  and 
the  stationary  language  matrices  will  depend  strongly  on  the  in¬ 
ference  procedure  used  by  the  pupil  to  create  a  meaning/signal 
mapping  from  the  teacher  responses.  Of  particular  intere.st  for 
our  purpose  is  the  finding  that  compositional  codes  emerge  in 
the  case  that  the  learning  strategy  adopted  by  the  pupil  supports 
generalization  and  that  this  ability  is  favored  by  the  introduc¬ 
tion  of  transmission  bottlenecks  in  the  communication  between 
teacher  and  pupil.  Such  a  bottleneck  occurs  when  the  learner 
does  not  observe  the  signal  for  some  objects.  This  contrasts  with 
the  sampling  effect  mentioned  before  in  which  the  learner  ob- 
.serves  the  signals  to  every  object.  In  this  contribution,  we  study 
whether  and  in  what  conditions  compositional  codes  emerge  in 
an  evolutionary  language  game. 

B.  The  Meaning-Signal  Mapping 

As  already  pointed  out,  language  is  viewed  as  a  mapping  be¬ 
tween  objects  (or  meanings)  and  signals  and  compositionality 
is  a  property  of  this  mapping:  a  compositional  language  is  a 
mapping  that  preserves  neighborhood  relationships,  i.e.,  nearby 
meanings  in  the  meaning  space  are  likely  to  be  associated  to 
nearby  signals  in  signal  space  [5].  At  first  sight,  this  notion  looks 
contradictory  to  the  well-established  fact  that  the  relation  be¬ 
tween  a  word  (signal)  and  its  meaning  is  utterly  arbitrary.  For 
instance,  as  pointed  out  by  Pinker  [31], 

“babies  should  not,  and  apparently  do  not,  expect  cattle 
to  mean  something  similar  to  battle,  or  singing  to  be  like 
stinging,  or  coats  to  resemble  goatsT 

In  fact,  Pettito  demonstrated  that  the  arbitrariness  of  the  rela¬ 
tion  between  a  sign  and  its  meaning  is  deeply  entrenched  in 
the  child’s  mind  [32].  On  the  other  hand,  sentences  like  John 
walked  and  Mary  walked  have  parts  of  their  semantic  repre¬ 
sentation  in  common  (someone  performed  the  same  act  in  the 
past)  and  so  the  meaning  of  these  sentences  must  be  clo.se  in  the 
meaning  space.  Since  both  sentences  contain  the  word  walked 
they  must  necessarily  be  close  in  signal  space  as  well.  Following 
Pinker,  we  acknowledge  a  significant  degree  of  arbitrariness  at 
the  level  of  word-object  pairing.  This  might  be  a  consequence 
of  a  much  earlier  (prehuman)  origin  of  this  mechanism,  as  com¬ 
pared  with  seemingly  distinctly  human  mind  mechanisms  for 
sentence-situation  pairing.  From  a  mathematical  modeling  per¬ 
spective,  however,  such  a  distinction  is  not  e.ssential  for  our  pur¬ 
poses,  since  the  signals  (sentences  or  words)  can  always  be  rep¬ 
resented  by  a  single  symbol — only  the  “distance”  between  them 
will  reflect  the  complex  inner  structure  of  the  signal  space.  For 
instance,  suppose  there  are  only  two  words  that  we  represent, 
without  lack  of  generality  by  0  and  1 .  Hence,  a  binary  sequence 
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Fig.  1.  Example  of  a  mapping  meaning-signal  for  n  =  m  =  4.  The  integers 
here  may  be  viewed  as  labels  for  complex  entities  (e.g.,  sentences).  The  large 
circles  indicate  cyclic  boundary  conditions  so  that,  e.g.,  signal  1  is  1  unit  distant 
from  signals  2  and  4.  The  code  represented  in  the  figure  has  compositional ity 
C  =  1 


or,  equivalently,  its  decimal  representation  can  represent  any 
sentence  in  this  language.  Here,  the  relevant  distance  between 
two  such  sentences  is  the  Hamming  distance  rather  than  the  re¬ 
sult  of  the  subtraction  between  their  labeling  integers.  This  no¬ 
tion,  of  course,  generalizes  trivially  to  the  case  when  the  sen¬ 
tences  are  composed  of  more  than  two  types  of  words. 

For  simplicity,  in  this  paper,  we  consider  the  case  where  both 
signals  and  meanings  are  represented  by  integer  numbers  and 
the  relevant  distance  in  both  signal  and  meaning  space  is  the 
result  of  the  usual  subtraction  between  integers.  Fig.  1  illus¬ 
trates  one  of  the  n  x  m  possible  meaning-signal  mappings. 
A  quantitative  measure  of  the  compositionality  of  a  communi¬ 
cation  code  is  given  by  the  degree  to  which  the  distances  be¬ 
tween  all  the  possible  pairs  of  meanings  correlates  with  the  dis¬ 
tance  between  their  corresponding  pairs  of  signals  [7].  Explic¬ 
itly,  let  Arriij  be  the  distance  between  meanings  i  and  j,  and 
the  distance  between  the  signals  associated  to  these  two 
meanings.  Introducing  the  averages  Am  =  Arriij/p  and 
A7i  =  where  the  sum  is  over  all  distinct  pairs 

p  =  n(n  “  l)/2  of  meanings,  the  compositionality  of  a  code  is 
defined  as  the  Pearson  correlation  coefficient  [7] 

-  As) 

SO  that  C  1  indicates  a  compositional  code  and  C  «  0  an  un¬ 
structured  or  holistic  code.  This  definition  applies  only  to  codes 
that  implement  a  (not  necessarily  arbitrary)  one-to-one  corre¬ 
spondence  between  meaning  and  signal. 

Strictly,  here  we  do  not  address  directly  the  emergence  of 
compositionality,  defined  as  the  property  that  the  meaning  of 
a  complex  expression  is  determined  by  the  meanings  of  its  parts 
and  the  rules  used  to  combine  them.  Rather,  we  focus  on  the 
emergence  of  structured  communication  codes,  which  preserve 
the  topology  of  the  meaning-signal  mapping,  in  that  similar 
meanings  are  associated  with  similar  signals  and  vice  versa.  It 
seems  that  an  important  aspect  of  joint  evolution  of  composi¬ 
tional  cognition  and  compositional  language  is  their  evolution 
along  with  structural  metric  (or  approximately  metric)  spaces 


of  cognition  and  meaning.  In  this  contribution,  we  assume  that  a 
metric  space  exists,  and  explore  the  consequences  for  the  emer¬ 
gence  of  compositionality.  The  connection  between  structured 
and  compositional  meaning-signal  mappings  can  be  made  ex¬ 
plicit  if  we  consider  an  artificial  scenario  for  which  there  is  a  pre¬ 
scription  to  derive  the  meaning  of  the  whole  given  the  meaning 
of  the  elementary  parts.  (Such  prescription  is  clearly  ruled  out 
in  real  language  since  context  and  previous  knowledge  play  a 
crucial  role  in  our  understanding  of  any  situation.)  In  this  case, 
the  distance  between  any  two  composite  meanings  could  be  in¬ 
ferred  by  comparing  their  components  and,  consequently,  by  in¬ 
troducing  a  metric  in  the  meaning  space. 

Our  approach  ties  in  with  the  view  that  properties  of  language 
such  as  compositionality  are  emergent  characteristics  of  the  ex¬ 
plosion  of  semantic  complexity  occurred  during  hominid  evolu¬ 
tion  [33].  Semantic  complexity  means  not  only  a  large  number 
of  cognitive  categories  (meanings)  but  also  an  increase  in  their 
perceived  interrelationships,  which  are  inherent  properties  of 
the  topology  of  the  meaning  space.  In  fact,  the  number  of  ob¬ 
jects  for  which  a  person  has  separate  words  is  not  too  large; 
a  recent  estimate  suggests  a  vocabulary  of  around  60,000  base 
words  for  well-educated  adult  native  speakers  of  English  [34]. 
This  is  a  not  a  very  big  number,  and  so  it  is  reasonable  to  assume 
that  object- word  associations  can  be  learned  from  examples,  one 
by  one.  The  number  of  situations  that  are  combinations  of  ob¬ 
jects,  on  the  other  hand,  is  larger  than  the  number  of  all  elemen¬ 
tary  particle  events  in  the  history  of  the  Universe.  This  supports 
a  need  for  the  assumption  of  compositionality  in  language.  As 
hinted  in  [33],  a  natural  avenue  to  study  the  evolution  of  com¬ 
plex  features  of  language  (e.g.,  compositionality)  is  the  increase 
of  the  complexity  of  the  meaning  space,  which  is  exactly  the  ap¬ 
proach  we  offer  in  this  contribution. 

C.  Errors  in  Perception 

So  far  as  the  communicative  accuracy  introduced  in  (1)  is 
concerned,  the  structures  of  the  meaning  and  signal  spaces  are 
irrelevant  to  the  outcome  of  the  evolutionary  language  game; 
the  total  population  payoff  is  maximized  when  all  agents  adopt 
a  code  that  implements  a  one-to-one  correspondence  between 
meanings  and  signals.  Such  a  code  is,  of  course,  described  by 
any  one  of  the  n!  permutation  language  matrices.  The  fact  that 
ultimately  all  agents  adopt  the  same  communication  code  is  a 
general  result  of  population  genetics  related  to  the  effect  of  ge¬ 
netic  drift  on  a  finite  population  [35].  To  permit  the  structures  of 
the  meaning  and  signal  spaces  to  play  a  role  in  the  evolutionary 
game  and  so  to  break  the  symmetry  among  the  permutation  ma¬ 
trices  so  as  to  favor  the  compositional  codes,  we  must  introduce 
a  new  ingredient  in  the  language  game,  namely,  the  possibility 
of  errors  in  perception  [8].  In  fact,  it  is  reasonable  to  assume 
that  in  the  earlier  stages  of  the  evolution  of  communication  the 
signals  were  likely  to  be  noisy  and  so  they  could  be  easily  mis¬ 
taken  for  each  other.  The  relevance  of  the  structure  of  the  signal 
space  becomes  apparent  when  we  note  that  the  closer  two  sig¬ 
nals  are,  the  higher  the  chances  that  they  are  mistaken  for  each 
other.  This  aspect  of  the  model  can  be  described  by  an  agent-in¬ 
dependent  mxm  confusion  matrix  £,  the  entries  of  which  dj 
yield  the  probability  of  signal  j  being  observed  as  signal  i  due 
to  corruption  by  noise  [8],  [9]. 
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To  introduce  the  structure  of  the  meaning  space  in  the  lan¬ 
guage  game,  we  note  first  that  (1)  has  a  simple  interpretation 
in  the  case  of  binary,  but  not  necessarily  permutation,  language 
matrices:  both  signaler  and  receiver  are  rewarded  with  1/2  unity 
of  payoff  whenever  the  receiver  interprets  correctly  the  meaning 
of  the  emitted  signal.  Otherwise,  there  is  no  reward  to  any  of 
the  two  parts,  no  matter  how  close  the  inferred  meaning  is  from 
the  correct  one.  This  gives  us  a  clue  as  to  how  to  modify  the 
model  in  order  to  take  into  account  the  meaning  structure — just 
ascribe  some  small  reward  value  to  both  agents  if  the  inferred 
meaning  is  close  to  the  intended  one.  In  fact,  giving  value  to  de¬ 
cisions  which  are  not  the  best  ones  is  a  common  assumption  in 
decision  and  game  theory  [36]  and  seems  to  be  consistent  with 
what  is  actually  observed  in  nature  since,  clearly,  not  every  mis¬ 
interpretation  is  equally  harmful  [9].  Consider  for  instance  the 
Vervel  monkey  alarm  calls  [37]:  misinterpreting  a  snake  alarm 
for  a  leopard  one,  and  hence  running  to  a  tree  instead  of  standing 
up  and  looking  in  the  grass,  is  clearly  much  better  than  misin¬ 
terpreting  it  for  an  eagle  call. 

Following  Nowak  et  al  [8]  and  Zuidema  [9],  we  can  for¬ 
malize  the  notion  of  meaning  similarity  by  introducing  another 
agent- independent  matrix,  the  n  x  n  value  matrix  V ,  so  that  Vij 
yields  the  payoff  attributed  to  an  agent  which  infers  meaning  i 
when  the  actual  meaning  the  signaler  intended  to  transmit  was 
j.  Hence,  the  overall  payoff  for  communication  between  agents 
I  and  .7  becomes  [9] 

1  =  1  J  =  1 

X  (e  X  (5) 

where  x  stands  for  the  usual  matrix  multiplication.  Note  that  (1 ) 
is  recovered  in  the  case  that  both  value  and  confusion  matrices 
are  diagonal. 

In  particular,  here  we  will  consider  the  simple  ca.se  in  which 
there  is  a  nonzero  probability  e  €  [0,  l]  that  a  signal,  say  signal 
j,  be  mistaken  for  one  of  its  nearest  neighbors  only,  = 

e/2  and  =  e/2.  Of  course,  the  probability  that  a  signal 

is  not  corrupted  by  noise  is  =  1  —  e.  If  signal  j  is  in  the 
boundary,  j  =  1  or  j  =  m,  then  we  use  the  cyclic  structure 
of  the  signal  space  to  set  eo,i  =  em.i  =  €‘/2  and  ejn+i.m  = 
Cl  =  e/2.  So,  in  the  example  of  Fig.  1,  signal  4  can  be  mis¬ 
taken  only  for  signals  3  or  1  with  probability  e.  Similarly,  agents 
are  rewarded  only  if  the  inferred  meaning  is  one  of  the  nearest 
neighbors  of  the  intended  meaning.  For  example,  if  the  intended 
meaning  is  j,  then  the  only  nonzero  entries  of  the  value  matrix 
V  are  vjj  =  1,  Vj+ij  =  r,  and  =  r.  Meanings  in 

the  boundary,  j  =  1  and  j  =  n,  are  treated  using  the  cyclic 
boundary  conditions  as  explained  for  the  signal  space.  Here, 
r  G  [0, 1]  is  a  parameter  that  measures  the  advantage,  in  terms 
of  payoff,  of  using  a  compositional  communication  code  rather 
than  a  Saussurean  one. 

Together  with  the  presence  of  noise,  this  last  ingre¬ 
dient — nonzero  reward  for  inferring  a  meaning  close  to 
the  correct  one — should  favor,  in  principle,  the  emergence  of 
compositional  communication  codes  in  an  evolutionary  game 
guided  by  Darwinian  rules.  In  what  follows,  we  will  show  that 


the  problem  of  evolving  efficient  communication  codes  within 
an  evolutionary  framework,  whether  in  the  presence  or  not  of 
noise,  is  more  difficult  than  previously  realized  [4],  [16],  [18]. 
This  problem  differs  from  usual  optimization  problems  tackled 
with  evolutionary  algorithms  in  that  the  maximization  of  the 
average  population  payoff  requires  a  somewhat  coordinated 
action  of  the  agents.  It  is  of  no  value  for  an  agent  to  exhibit  the 
correct  “genome”  (i.e.,  the  transmission  and  reception  matrices) 
if  it  cannot  communicate  efficiently  with  the  other  agents  in  the 
population  because  they  use  different  language  matrices. 

The  emergent  view  of  compositionality  adopted  here  differs 
from  the  approach  followed  by  Nowak  et  al.  [29]  to  study  the 
evolution  of  syntactic  (or  combinatorial)  communication.  In  that 
work,  the  conditions  at  which  syntax  is  advantageous  over  non¬ 
syntactic  or  holistic  languages  were  determined,  namely,  when 
the  number  of  required  signals  to  express  the  relevant  meanings 
exceeds  .some  threshold  value.  (It  should  be  noted  that  combina¬ 
torial  communication  has  its  disadvantages  too,  since  it  boosts 
the  potential  for  deception  [38].)  However,  the  finding  that  the 
adoption  of  a  particular  communication  code  is  better  for  the 
population,  in  that  it  yields  a  higher  overall  payoff,  is  no  guar¬ 
antee  that  such  code  will  actually  spread  in  the  population.  On 
the  contrary,  in  this  contribution  we  show  that  the  Allee  ef¬ 
fect  will  prevent  its  spreading.  Additional  assumptions,  such  as 
the  semantic  continuity  of  incremental  learning  proposed  here, 
seem  to  be  necessary  to  guarantee  the  emergence  of  composi¬ 
tional  codes. 

III.  Population  Dynamics 

We  assume  that  the  offspring  learn  their  languages  from  their 
parents.  Were  it  not  for  the  effect  of  errors  during  learning, 
which  results  in  small  changes  in  the  language  matrices,  the  off¬ 
spring  would  be  identical  to  their  parents.  Like  mutations  in  the 
genetic  setup,  these  learning  errors  allow  for  the  variability  of 
the  agents,  and  thus  for  the  action  of  natural  selection. 

We  start  with  N  agents  (typically  N  =  100)  whose  binary 
language  matrices  are  set  randomly.  Explicitly,  for  each  agent 
and  for  each  meaning  i  =  1 . n,  we  choose  randomly  an  in¬ 

teger  j  E  {1, . . . ,  in)  and  set  pij  =  1  and  pik  =  0  for  k  /  j. 

Similarly,  for  each  signal  j  =  1 . rn,  we  choose  an  integer 

i  E  {1, . . .  ,n}  and  set  qji  =  1  and  =  0  for  k  ^  i.  This 
procedure  guarantees  that  initially  P  and  Q  are  independent 
random  probability  matrices.  Note  that,  in  general,  they  are  not 
permutation  matrices  at  this  stage.  To  calculate  the  total  payoff 
of  a  given  agent,  say  agent  /,  we  let  it  interact  with  every  other 
agent  in  the  population.  At  each  interaction,  the  emitted  signal 
can  be  mistaken  for  one  of  the  neighboring  signals  with  prob¬ 
ability  £.  According  to  (5),  at  each  communication  event  (an 
interaction)  agent  I  receives  the  payoff  value  1/2  if  the  receiver 
guesses  the  intended  meaning  of  the  signal  that  /  has  emitted, 
the  payoff  value  r/2  if  the  receiver  guessing  is  one  of  the  nearest 
neighbors  of  the  intended  meaning,  and  payoff  value  0,  other¬ 
wise.  Of  course,  the  receiver  obtains  the  same  payoff  accrued 
to  agent  I.  Once  the  payoffs  or  fitness  of  all  N  agents  are  tab¬ 
ulated,  the  relative  payoffs  can  be  calculated  according  to  (3), 
and  then  used  to  select  the  agent  that  will  contribute  with  one 
offspring  to  the  next  generation. 
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To  keep  the  population  size  constant,  we  must  eliminate  one 
agent  from  the  population.  To  do  that  we  will  use  two  strate¬ 
gies:  1 )  to  choose  the  agent  to  be  eliminated  at  random,  regard- 
le.ss  of  its  fitness  value  and  2)  to  use  an  elitist  strategy  which 
eliminates  the  agent  with  the  lowest  fitness  value.  In  both  cases, 
the  recently  produced  offspring  is  spared  from  demise.  The  first 
selection  procedure  is  Moran’s  model  of  population  genetics 
[35].  Both  procedures  differ  from  the  standard  genetic  algorithm 
implementation  [28]  in  that  they  allow  for  the  overlapping  of 
generations,  a  crucial  prerequisite  for  cultural  evolution  which 
may  be  relevant  when  learning  is  allowed.  In  practice,  however, 
Moran’s  model  does  not  differ  from  the  parallel  implementation 
in  which  the  entire  generation  of  parents  is  replaced  by  that  of 
the  offspring  in  a  single  generation.  We  define  the  generation 
time  t  as  the  number  of  generations  needed  to  produce  N  off¬ 
spring  with  the  consequent  elimination  of  the  same  number  of 
agents. 

Finally,  to  allow  for  the  appearance  of  novel  codes  (or  lan¬ 
guage  matrices)  in  the  population,  changes  are  performed  in¬ 
dependently  on  the  transmission  and  reception  matrices  of  the 
offspring  with  probability  u  G  [0,1].  Explicitly,  the  transmi.s- 
sion  matrix  P  is  modified  by  changing  randomly  the  signal  as¬ 
sociated  to  an  also  randomly  chosen  meaning  with  probability 
n.  A  similar  procedure  updates  the  reception  matrix  Q.  Hence, 
the  probability  that  the  same  offspring  has  its  transmission  and 
reception  matrices  simultaneously  altered  by  errors  is  and 
the  probability  that  it  will  differ  somehow  from  its  parent  is 
/i  =  1  —  ( 1  “  u)^.  Henceforth,  we  will  refer  to  (l  as  the  proba¬ 
bility  of  error  in  language  acquisition. 

To  facilitate  comparison  between  different  evolutionary  algo¬ 
rithms,  we  define  a  properly  normalized  average  payoff  of  the 
population 

/=! 

SO  that  G  €  [0, 1].  The  maximum  value  G  =  1  is  reached  for 
Saussurean  codes  in  the  case  of  noiseless  communication. 

In  Fig.  2,  we  present  the  effect  of  the  inaccuracy  in  language 
acquisition  on  the  average  payoff  of  the  population  for  the  sim¬ 
plest  situation,  namely,  5  =  0  (the  receiver  always  gets  the  orig¬ 
inal  signal)  and  r  =  0  (only  inference  of  the  correct  meaning 
is  rewarded).  The  results  show  a  stark  difference  between  the 
elitist  and  the  usual  evolutionary  strategy  regarding  the  form 
they  are  affected  by  learning  errors.  Whereas  the  performance 
of  Moran's  model  is  degraded  for  high  error  rates  [39],  reaching 
the  payoff  of  random  binary  matrices  for  /i  =  1,  the  elitist 
strategy  actually  benefits  from  those  errors  and  gets  to  the  max¬ 
imum  payoff  for  the  highest  possible  error  rate.  In  fact,  for  small 
but  nonzero  values  of  the  error  rate,  the  communication  accu¬ 
racy  of  the  elitist  strategy  is  practically  constant  and  starts  to  in- 
crea.ses  only  after  crosses  some  threshold  value  i.l  «  0.02.  The 
performance  of  Moran’s  model,  on  the  other  hand,  indicates  the 
existence  of  an  optimum  value  of  the  learning  error  for  which 
the  communication  accuracy  is  maximum.  Longer  runs  do  not 
show  any  significant  change  of  the  pattern  illustrated  in  Fig.  2. 
What  enables  the  elitist  strategy  to  take  advantage  of  errors  is  the 
overlapping  of  generations  together  with  the  immediate  removal 


Fig.  2.  Normalized  average  payoff  G  of  the  population  as  function  of  the  prob¬ 
ability  of  error  in  language  acquisition  ft  in  the  case  of  .V  =  100  agents  com¬ 
municating  about  n  =  10  meanings  using  in  =  10  signals.  The  evolution  was 
followed  uniil  t  =  2  x  10^  for  ihe  eliiist  strategy  (o)  and  until  f  =  10^  for 
Moran’s  model  (A).  The  symbols  rcprescni  the  average  of  over  50  indepen¬ 
dent  runs.  The  error  bars  are  smaller  than  the  symbol  sizes.  For  fi  =  0.  we  find 
G  =  0.255  ±0.005  for  both  strategie.s,  whereas  for  random  language  matrices 
we  find  G  =  0.1  ±  0.0001.  The  other  parameters  are  5  =  r  =  0.  The  .search 
space  is  the  m  "  x  n"’  space  spanned  by  the  iwo  independent  binary  probability 
matrices  P  and  Q. 

of  unfit  agents  from  the  population.  This  combination  prevents 
the  accumulation  of  inefficient  agents  in  the  population  and  the 
con.sequent  degradation  of  the  communication  performance  ob¬ 
served  in  Moran’s  model.  Moreover,  by  eliminating  the  agent 
that  performs  worse  in  the  language  game,  the  elitist  strategy 
adds  an  extra  kick  to  the  selective  pressure  towards  better  com¬ 
munication  codes,  in  addition  to  the  fitness  regulation  of  off¬ 
spring  production  described  in  (3). 

The  reason  the  elitist  strategy  can  guide  the  population  to  a 
regime  of  practically  perfect  communication  accuracy  even  in 
the  presence  of  a  constant  flux  of  inefficient  mutants  {fi  =  1)  is 
that  a  defective  offspring,  though  spared  from  demi.se  at  birth, 
will  almost  certainly  be  purged  from  the  population  in  the  next 
step.  We  recall  that  a  single  generation  comprises  N  such  gener¬ 
ation/elimination  steps.  In  this  scheme,  the  population  can  main¬ 
tain  at  most  a  single  defective  agent,  thus  resulting  in  a  reduction 
of  the  maximum  normalized  payoff  by  a  factor  1  /riTV.  In  view  of 
the  remarkable  effectiveness  of  the  elitist  strategy  to  maximize 
the  communication  accuracy  of  the  population,  in  what  follows 
we  will  present  the  results  for  that  strategy  only. 

Fig.  3  presents  the  average  communication  accuracy  for 
100  independent  runs  (populations)  in  a  generic  ca.se  in  which 
the  parameters  e  and  r,  which  couple  the  dynamics  with  the 
distances  in  the  signal  and  meaning  spaces  are  nonzero.  Now, 
since  the  communication  between  any  two  agents  is  affected  by 
noise,  we  must  adopt  a  slightly  different  procedure  to  evaluate 
the  payoff  of  the  entire  population.  As  before,  we  follow  the 
evolutionary  dynamics  (i.e.,  the  differential  reproduction  and 
leaming-with-error  procedures)  until  t  =  2  x  10^,  then  we 
store  the  language  matrices  of  all  N  agents.  Keeping  these 
matrices  fixed,  we  evaluate  the  average  population  payoff  in 
100  contests.  A  contest  is  defined  by  the  interaction  between 
all  pairs  of  agents  in  the  population.  Actually,  according  to  (5), 
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Fig.  3.  Normalized  average  payoff  for  the  eliti.st  (o)  strategy  at  f  =  2  x  10^ 
for  100  independent  sample  runs  of  the  evolutionary  dynamics.  These  res'ults 
arc  compared  with  that  of  a  fully  compositional  code  (solid  line)  and  of  Saus- 
surean  codes  ( x ).  The  parameters  and  search  space  are  the  same  as  in  Fig.  2 
with  /i  =  1,  except  that  now  we  have  included  a  pressure  for  compositionality: 
the  signals  are  corrupted  with  probability  £  =  0.2  and  the  ratio  between  the 
payoffs  for  inferring  a  close  and  the  correct  meaning  is  r  =  0.25.  The  optimal, 
compositional  code  yields  G  ss  0.85  and  ihc  typical  payoff  of  a  Saussurean 
code  is  G  ^  0.80. 


each  interaction  comprise.s  two  communication  attempts,  since 
any  given  agent  first  plays  the  role  of  the  emitter  and  then  of  the 
receptor.  Hence,  a  contest  amounts  to  N{N  —  1)  communica¬ 
tion  events.  Of  course,  in  the  noiseless  case  (e  =  0),  the  payoff 
obtained  would  be  the  same  in  all  contests.  The  procedural 
changes  are  needed  to  average  out  the  effects  of  noise.  For 
instance,  in  a  single  interaction  two  perfectly  compositional 
codes  could  perform  worse  than  two  holistic  codes  if,  by  sheer 
chance,  the  signals  happen  to  be  corrupted  only  during  the 
interaction  of  the  compositional  codes.  To  avoid  such  spurious 
effects  the  payoffs  resulting  from  the  interactions  between  any 
two  agents  are  averaged  out  over  100  different  interactions. 

For  the  purpo.se  of  comparison,  in  Fig.  3  we  also  present  the 
results  for  a  population  of  agents  carrying  the  same  perfectly 
compositional  code  {C  =  1),  as  well  as  for  a  similarly  homoge¬ 
nous  population  of  agents  carrying  identical  Saussurean  codes. 
These  are  control  populations  that  in  contrast  to  the  elitist  pop¬ 
ulations,  do  not  evolve.  In  the  absence  of  noise,  these  control 
populations  would  reach  the  maximum  allowed  payoff,  G  =  1. 
We  note  that  a  perfectly  compositional  code  is  not  a  Saussurean 
code,  in  the  sense  that  the  one-to-one  mapping  between  meaning 
and  signals  is  not  arbitrary.  The  elitist  strategy  seems  to  face 
great  difficulties  even  to  find  a  Saussurean  code,  as  compared 
with  the  performance  in  the  noiseless  case  (see  Fig.  2)  for  in¬ 
stance,  not  to  mention  to  find  the  optimum,  perfect  composi¬ 
tional  code.  Actually,  in  the  presence  of  noise,  the  performance 
of  the  Saussurean  code  seems  to  pose  an  upper  limit  to  the  per¬ 
formance  of  the  elitist  strategy  by  acting  as  an  attractor  to  the 
evolutionary  dynamics. 

It  is  instructive  to  calculate  the  average  payoff  Gc  of  a  pop¬ 
ulation  composed  of  identical  agents  carrying  a  perfect  compo¬ 
sitional  code.  Consider  the  average  payoff  received  by  a  given 
agent,  say  /,  in  a  very  large  number  of  interactions  with  one  of 
its  siblings,  say  J.  When  I  plays  the  signaler  role  its  average 


Fig.  4.  Compositionality  of  the  code  carried  by  the  agent  with  the  highest 
payoff  in  the  runs  shown  in  Fig.  3.  The  compositionality  of  the  perfect  com¬ 
positional  code  is,  by  definiiion,  C  =  1 .  There  is  a  slight  lendency  lo  composi¬ 
tionality  in  the  codes  produced  by  the  elitisi  ( o )  strategy  as  compared  with  those 
of  the  Saussurean  codes  ( x ) . 

payoff  is  (1  —  e)  X  1/2  +  c  X  r/2,  which,  by  symmetry,  hap¬ 
pens  to  be  the  same  average  payoff  /  receives  when  it  plays  the 
receiver  role.  Since  all  agents  are  identical,  the  expected  payoff 
of  any  agent  equals  that  of  the  population.  Hence 

Gc=l-e{l-r).  (7) 

We  can  repeat  this  very  same  reasoning  to  derive  the  average 
payoff  Gs  of  a  homogenous  population  of  Saussurean  codes.  In 
this  case,  by  playing  the  signaler,  /  receives  the  average  payoff 
(1  -  e:)  X  1/2  + x  2/(71“  1)  x  r/2,  where  the  factor  2/(n—  1) 
accounts  for  the  fact  that  the  reward  r/2  is  obtained  only  if 
the  inferred  meaning  is  one  of  the  two  neighbors  of  the  correct 
meaning.  This  reasoning  is  valid  for  n  >  2  only,  .since  for  n  =  2 
each  meaning  has  a  single  neighbor,  and  so  there  is  no  difference 
between  Saussurean  and  compositional  codes.  Taking  into  ac¬ 
count  the  payoff  received  by  I  when  playing  the  receiver  yields 

Gs  =  1  —  c  H — ^ — -r  (8) 

n  -  1 

for  n  >  2.  Note  that  Gc  >  Gs  for  ii  >  3.  Similarly  to  the  case 
n  =  2,  the  Saussurean  codes  for  7i  =  3  are  compositional  codes 
because  of  the  cyclic  boundary  conditions  in  the  meaning  space. 
In  Fig.  4,  we  show  the  compositionality  of  the  code  carried  by 
the  agent  with  the  largest  payoff  value  in  each  of  the  runs  used 
to  generate  the  data  of  Fig.  3.  Although  there  is  a  slight  ten¬ 
dency  to  compositionality  in  the  codes  produced  by  the  elitist 
strategy,  it  is  fair  to  say  that  the  pressure  to  generate  compo¬ 
sitional  code  has  not  worked  as  expected,  despite  the  clear  ad¬ 
vantage  of  such  codes  given  the  conditions  of  the  experiment 
(see  Fig.  3).  As  pointed  out,  the  reason  for  that  might  be  that 
the  Saussurean  codes  act  as  barriers  (local  maxima)  from  which 
the  evolutionary  dynamics  cannot  escape,  thus  impeding  it  from 
reaching  a  perfect  compositional  code  (global  maximum). 

The  results  depicted  in  Fig.  3  expose  clearly  the  failure  of 
the  language  evolutionary  framework  to  produce  efficient  com¬ 
munication  codes  when  the  receiver  must  interpret  noisy  sig¬ 
nals.  To  rule  out  the  possibility  that  the  cause  of  .such  failure 
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Fig.  5.  Average  payoff  resulting  from  1 00  independent  runs  of  the  noisy  evolu¬ 
tionary  language  game  with  the  search  space  rcstricied  lo  permutation  matrices 
(o )  as  a  function  of  Ihe  pressure  for  composilionalily.  The  error  bars  are  smaller 
than  the  symbol  sizes.  The  upper  straight  line  is  the  function  Gc  =  (1  +  r)/2 
that  yields  the  average  payoff  of  a  perfect  compositional  code  and  the  lower 
straight  line  is  Gs  =  0.5  +  0.1  Ir  that  yields  the  average  payoff  of  a  Saus- 
surean  code  (see  (7)  and  (8)).  The  parameters  are  £  =  0.5,  (i  =  0.9.  X  =  100, 
and  11  =  ni  =  10. 


was  the  initial  unlikely  decoupling  between  production  and  in¬ 
terpretation,  in  the  following,  we  will  restrict  the  search  space 
to  that  of  Saussurean  codes.  Hence,  for  any  agent,  the  transmis¬ 
sion  matrix  P  is  a  permutation  matrix  and  the  reception  matrix 
Q  has  entries  given  by  qj^  =  1  if  =  1  and  0  otherwise  (Q  is 
also  a  permutation  matrix).  The  initial  population  is  composed 
of  N  agents  adopting  distinct  Saussurean  codes.  To  guarantee 
that  all  new  codes  generated  by  mutations  stay  within  our  search 
space,  we  modify  the  mutation  procedure  so  that  with  proba¬ 
bility  ft  the  signal  associated  to  a  randomly  chosen  meaning,  say 
i,  is  exchanged  with  the  signal  associated  to  another  randomly 
chosen  meaning,  say  k.  This  corresponds  to  the  interchange  of 
the  rows  i  and  k  of  the  transmission  matrix.  The  reception  ma¬ 
trix  is  then  updated  accordingly.  The  sole  genetic  strategy  we 
use  in  the  forthcoming  simulations  is  the  elitist  one,  in  which 
the  worst  performing  agent  is  replaced  by  the  offspring  of  the 
agent  chosen  by  rolling  the  fitness  wheel. 

In  Fig.  5,  we  show  the  results  of  the  experiments  with  the  evo¬ 
lutionary  search  restricted  to  the  space  of  permutation  matrices. 
The  procedure  we  use  here  was  the  same  as  that  employed  to 
draw  Figs.  3  and  4:  after  the  evolutionary  dynamics  has  settled 
to  an  equilibrium  (i.e.,  all  agents  are  using  the  same  communi¬ 
cation  code,  except  for  single  temporary  mutants),  the  resulting 
homogeneous  population  is  then  left  to  interact  for  100  con¬ 
tests  and  the  average  payoff  is  recorded.  However,  instead  of 
exhibiting  the  payoff  obtained  in  the  100  independent  runs  as  in 
Fig.  3,  we  exhibit  in  Fig.  5  only  the  average  payoff  calculated 
over  those  runs.  Hence,  to  obtain  each  data  point  of  this  figure 
we  need  to  generate  a  set  of  data  similar  to  that  used  to  draw 
Fig.  3.  We  choose  as  the  independent  variable  the  ratio  between 
the  payoffs  for  inferring  a  neighbor  of  the  correct  meaning  and 
the  correct  meaning  (r),  which  can  be  interpreted  also  as  a  se¬ 
lective  pressure  for  evolving  compositional  codes.  For  the  sake 


Fig.  6.  Average  composilionalily  of  the  I  (X)  evolved  communication  codes 
(o)  whose  payoffs  are  exhibited  in  Fig.  5,  as  well  as  of  the  same  number  of 
Saussurean  codes  ( X ).  The  compositionality  of  a  perfect  compositional  code  is 
C  =  1  by  definition.  The  linear  fitting  of  the  average  compositionality  of  the 
evolved  codes  yields  a  slope  of  wO.43. 


of  comparison.  Fig.  5  also  shows  the  average  payoffs  of  perfect 
compositional  and  random  Saussurean  codes. 

The  results  in  Fig.  5  indicate  that  for  r  =  0,  the  performance 
of  the  communication  codes,  regardless  of  whether  random, 
compositional  or  evolved,  are  identical.  Explicitly,  in  this  case, 
we  find  G  =  1  —  £  for  any  one-to-one  mapping.  Since  the 
search  space  is  now  restricted  to  the  space  of  permutation  ma¬ 
trices,  it  is  not  a  surprise  that  the  payoffs  of  the  Saussurean  codes 
serve  as  lower  bounds  to  those  of  the  evolved  codes.  This  trivial 
finding  should  not  be  confused  with  the  unexpected  re.sult  ex¬ 
hibited  in  Fig.  3,  that  the  payoffs  of  the  Saussurean  codes  serve 
as  upper  bounds  to  the  payoffs  of  the  evolved  codes  when  the 
.search  space  is  enlarged  to  cover  all  binary  language  matrices. 
The  results  in  Fig.  5  show  clearly  that,  despite  the  fact  that  com¬ 
positionality  can  greatly  improve  the  communication  payoff  of 
the  population  (see  upper  straight  line  in  that  figure),  the  evolved 
codes  fall  short  of  taking  full  advantage  of  the  structure  of  the 
meaning-signal  space  to  cope  with  the  noise  in  the  communi¬ 
cation.  As  a  result,  the  evolved  codes  are  far  from  the  optimal, 
perfect  compositional  codes,  although  they  fare  better  than  the 
Saussurean  codes.  Fig.  6  explains  the  reason  for  that:  the  evolu¬ 
tionary  dynamics  actually  succeeded  to  produce  partially  com¬ 
positional  codes,  thus  reducing  the  deleterious  effects  of  noise. 

It  is  interesting  that  the  payoffs  of  the  Saussurean  codes  in¬ 
crease  when  the  pressure  for  compositionality  increases  [see 
Fig.  5  and  (8)],  although  they  remain  largely  noncomposiiional 
in  average  (see  Fig.  6).  The  key  to  the  explanation  of  this  result  is 
found  in  Fig.  4,  where  we  can  see  that  half  of  the  samples  of  the 
random  Saussurean  codes  exhibit  a  positive  value  of  the  com¬ 
positionality,  which  is  then  associated  to  a  payoff  value  greater 
than  1  —  £•  (=  0.8  in  that  case),  while  the  representatives  of  the 
other  half  have  a  payoff  of  1  -  e  at  worst.  It  is  clear  that  the 
resulting  average  payoff  must  be  an  increasing  function  of  r. 

The  reason  that  the  evolutionary  dynamics  failed  to  produce 
perfect  compositional  codes,  despite  their  obvious  advantage  to 
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Fig.  7.  The  evolution  of  the  fraction  /  of  agents  carrying  a  perfect  composi¬ 
tional  code  in  an  experiment  in  which  they  compete  against  agents  carrying  a 
Saussurcan  code.  The  parameters  are  c  =  0.5,  r  =  0.25,  .V  =  100.  and 
n  =  m  =  10.  The  initial  population  is  .set  .so  that  (from  lop  to  bottom)  /  =  0.8. 
0.5.0.42.  0.419.  and  0.2. 


cope  With  noisy  signals,  is  that  once  a  nonoptimal  communica¬ 
tion  code  has  become  fixed  (or  even  almost  fixed)  in  the  popula¬ 
tion,  mutants  carrying  better  codes  cannot  invade.  In  fact,  those 
mutants  will  most  certainly  do  badly  when  communicating  with 
the  resident  agents  and,  as  a  result,  will  quickly  be  removed  from 
the  population.  As  pointed  out,  this  is  essentially  the  Allee  ef¬ 
fect  of  population  dynamics. 

The  task  faced  by  the  evolutionary  algorithm  here  is  of  an  es¬ 
sentially  different  nature  from  that  tackled  in  typical  optimiza¬ 
tion  problems  in  which  the  fitness  of  an  agent  is  frequency  in¬ 
dependent.  In  such  a  case,  a  fitter  mutant  can  always  invade 
the  resident  population.  To  stress  this  phenomenon.  Fig.  7  illus¬ 
trates  the  competition  between  a  fraction  /  of  agents  carrying 
(the  same)  perfect  compositional  code  and  a  fraction  1  —  /  of 
agents  carrying  (the  same)  Saussurean  code.  This  simulation  is 
implemented  using  the  elitist  procedure  described  before,  ex¬ 
cept  that  learning  errors  are  not  allowed,  so  that  at  any  time  an 
agent  can  carry  only  one  of  the  two  types  of  codes  set  initially. 
Alternatively,  Fig.  7  can  be  interpreted  as  the  competition  be¬ 
tween  two  different  strategies:  the  perfect  compositional  and  the 
holistic  strategies.  We  can  easily  estimate  the  minimum  fraction 
fru  of  perfect  compositional  codes  above  which  this  strategy 
dominates  the  population.  It  is  simply 

/rn  _  ^5 

Gr 


l-/n 


(9) 


with  Gc  and  Gs  given  by  (7)  and  (8),  respectively.  For  the  pa¬ 
rameters  of  Fig.  8,  this  estimate  yields  fm  ^  0.46,  which  within 
.statistical  errors,  is  in  very  good  agreement  with  the  single  run 
experiment  described  in  the  figure.  Repetition  of  this  experiment 
using  Moran’s  model  rather  than  the  elitist  strategy  leads  to  the 
same  result,  except  that  the  fixation  of  the  winner  strategy  takes 
much  longer — about  100  times  longer  than  the  fixation  times 
exhibited  in  Fig.  7. 

This  simple  analysis  of  the  competition  between  suboptimal 
Saussurean  codes  and  the  optimal  compositional  codes  lends 
support  to  our  previous  conclusion  that  compositional  codes  do 


not  evolve  within  the  usual  language  evolutionary  game  frame¬ 
work  because  the  evolutionary  dynamics  is  very  likely  to  get 
trapped  in  the  local  maxima — the  Saussurean  codes. 

IV.  Incremental  Meaning  Assimilation 

What  we  have  been  trying  to  do  up  to  now  is  to  evolve  in  a 
single  shot  a  communication  code  that  associates  each  of  the 
n  meanings  (or  objects)  to  one  of  the  m  signals  available  in 
the  repertoire  of  the  agents.  As  pointed  out,  in  the  case  that  the 
meaning-signal  mapping  has  a  nontrivial  underlying  structure, 
the  optimal  association  is  not  completely  arbitrary  in  the  sense 
that  in  the  presence  of  noise  some  codes  (i.e.,  the  perfect  compo¬ 
sitional  codes)  result  in  a  much  better  communication  accuracy 
than  codes  that  implement  an  arbitrary  one-to-one  correspon¬ 
dence  between  meaning  and  signals  (Saussurean  codes).  The 
results  of  the  previous  simulations  lead  us  to  conclude  that  it 
is  very  unlikely,  if  not  impossible,  that  evolution  through  nat¬ 
ural  selection  alone  could  take  advantage  of  the  structure  of  the 
meaning-signal  space  to  produce  the  optimal,  perfect  composi¬ 
tional  codes. 

The  outcome  would  be  very  different,  however,  if  the  task 
posed  to  the  population  were  to  reach  a  consensus  on  the  signals 
to  be  assigned  to  the  meanings  in  a  sequential  manner.  In  other 
words,  let  us  consider  the  situation  in  which  each  agent  has  m 
signals  available  (here  we  set  m  =  10)  and  the  population  needs 
to  communicate  about  a  single  meaning,  say  t  =  1.  The  .search 
space  is  reduced  then  to  the  space  of  the  1  x  rn.  permutation 
matrices.  (We  restrict  the  search  space  to  that  of  permutation 
matrices,  for  simplicity.)  Once  the  consensus  is  reached  (i.e.,  the 
signal  assigned  to  meaning  r  =  1  is  fixed  in  the  population),  a 
new  meaning  is  presented  and  the  population  is  then  challenged 
to  find  a  consensus  signal  for  that  meaning.  The  procedure  is 
repeated  until  each  of  the  n  =  m  meanings  are  associated  to  a 
unique  signal. 

In  the  case  of  structured  meaning-signal  mappings,  the  order 
of  presentation  of  meanings  to  the  population  plays  a  crucial 
role  on  the  outcome  of  this  strategy,  which  we  term  sequential 
meaning  assimilation.  In  particular,  success  is  guaranteed  only 
if  the  novel  meaning  is  a  neighbor  of  the  previously  presented 
meaning  (e.g.,  i  —  2  or  i  =  N  in  the  case  the  previous  assimi¬ 
lated  meaning  was  2  =  1).  In  this  case,  the  question  is  whether 
the  population  will  reach  a  consensus  on  a  signal  that  is  also  a 
neighbor  of  the  signal  assigned  to  the  previous  meaning.  Curve 
(a)  of  Fig.  8  shows  that  this  scheme  works  neatly,  and  yields 
a  fully  compositional  code  provided  that  £  ^  0  and  r  ^  0. 
We  note  that  when  the  number  of  assimilated  meanings  is  less 
than  the  size  of  the  repertoire  of  signals  rri,  the  payoff  of  the  se¬ 
quential  assimilation  scheme  [curve  (a)]  falls  below  the  average 
payoff  a  fully  compositional  code  (dashed  horizontal  line),  be¬ 
cause  until  all  meanings  are  presented,  the  codes  produced  by 
that  scheme  cannot  take  full  advantage  of  the  topology  of  the 
meaning  and  signal  spaces.  The  following  example  explains  the 
reason  this  is  so.  Consider  the  situation  in  which  two  meanings 
were  assimilated,  say  i  =  1,2  and  the  signals  assigned  to  them 
were  j  =  6,7,  respectively.  The  agents  will  receive  no  reward 
if  the  corrupted  signals  become  5  or  8  (we  recall  that  r7i  =  10 
in  this  experiment),  since  at  this  point  there  are  no  meanings  as¬ 
sociated  to  these  altered  signals.  In  contrast,  reward  is  always 
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Fig.  8.  Average  payoff  of  the  population  when  the  task  is  to  produce  consens'us 
.signals  to  r?  meanings  presented  s*equentially  at  the  time  intervals  At  =  100.  In 
curve  (a),  the  new  meaning  is  a  neighbor  of  the  previous  one,  whereas  in  curve 
(c),  the  order  of  presentation  of  ihc  meanings  is  random.  The  result  for  the  usual 
batch  algorithm,  in  which  all  meanings  are  presented  simultaneously,  is  shown 
in  curve  (b).  The  dashed  horizontal  line  indicates  the  average  performance  of 
perfect  compositional  codes.  The  parameters  are  f  =  0.5,  r  =  O.o,  jV  =  100, 
and  V  =  in  =  10. 


guaranteed  for  the  fully  formed  compositional  code  since,  by 
definition,  all  meanings  are  assimilated  at  the  very  outset  in  this 
case.  Of  course,  as  seen  in  Fig.  8,  this  “surface”  effect  is  at¬ 
tenuated  as  more  meanings  are  assimilated.  The  fact  that  the 
final  payoff  of  the  single  run  displayed  in  curve  (a)  ends  up 
being  greater  than  the  (theoretical)  average  payoff  of  the  per¬ 
fect  compositional  code  is  simply  a  statistical  fluctuation.  Curve 
(c)  in  Fig.  8  illustrates  the  failure  of  the  sequential  presentation 
scheme  when  the  order  of  presentation  of  meanings  is  random. 
In  fact,  if  the  meanings  are  presented  in  an  arbitrary  order,  say 
/.  =  3  after  i  =  \y  then  there  is  no  selection  pressure  to  prevent 
that  the  signal  assigned  to  2  =  3  be  one  of  the  neighbors  of  the 
signal  associated  to  ?  =  1.  Eventually,  when  the  meaning  i  =  2 
is  pre.sented  this  optimal  signal  will  be  unavailable  to  the  agents, 
precluding  thus  the  emergence  of  a  compositional  code.  Finally, 
we  note  that  the  incremental  learning  scheme  would  work  all  the 
same  if  the  repertoire  of  meanings  were  left  fixed  and  the  sig¬ 
nals  were  pre.sented  one  by  one. 

The  proposed  solution  to  the  evolution  of  compositional 
codes  in  an  evolutionary  language  game  framework  could  be 
questioned,  because  it  relies  on  the  assumption  that  the  new 
meanings  entering  the  population  repertoire  must  be  closely 
related  to  the  already  assimilated  meanings.  However,  this 
seems  to  be  the  manner  in  which  the  perceptual  systems  work 
during  categorization:  new  meanings  are  usually  hierarchically 
related  to  the  assimilated  ones  and  this  could  be,  for  instance, 
the  reason  for  ZipPs  law  of  languages  [40],  [41].  In  fact,  as 
pointed  out  in  [33],  the  hierarchical  structure  of  language  may 
be  caused  by  our  perception  of  reality,  rather  than  the  other 
way  around.  The  case  for  a  hierarchically  organized  world  was 
made  by  Simon  [42]: 

“On  theoretical  grounds  we  could  expect  complex  sys¬ 
tems  to  be  hierarchies  in  a  world  in  which  complexity  had 
to  evolve  from  simplicity.” 


In  addition,  the  evidence  that  nouns  are  easily  changed  into 
verbs  (e.g.,  ship-shipped,  bottle-bottled)  [43]  illustrates  the 
same  type  of  continuity  in  the  signal  space  as  well. 

In  any  event,  our  solution  is  in  line  with  the  traditional  Dar¬ 
winian  explanation  to  the  evolution  of  the  so-called  irreducibly 
complex  systems.  Although  the  evolutionary  game  setting  failed 
to  evolve  perfect  compositional  codes  when  the  task  was  to  pro¬ 
duce  a  meaning-signal  mapping  by  assimilating  all  meanings 
simultaneously,  that  setting  proved  successful  when  the  mean¬ 
ings  were  created  gradually. 

V.  Conclusion 

Saussure’s  notion  of  language  as  a  contract  signed  by  mem¬ 
bers  of  a  community  to  set  arbitrarily  the  correspondence  be¬ 
tween  words  and  meanings  leads  to  unexpected  obstacles  to  the 
evolution  of  efficient  communication  codes  in  the  evolutionary 
language  game  framework.  In  fact,  the  fixation  of  a  communi¬ 
cation  code  in  a  population  is  a  once-for-all  decision — it  cannot 
be  changed  even  if  a  small  fraction  of  the  population  acquires  a 
different,  more  efficient  code  (see  Fig.  7).  The  situation  here  is 
similar  to  the  evolutionary  stable  strategies  of  game  theory  [27], 
the  escape  from  which  is  only  possible  if  all  players  change  their 
strategies  simultaneously.  Since  such  concerted,  global  changes 
are  not  part  of  the  rules  of  the  language  game,  there  seems  to  be 
no  way  for  the  population  to  escape  from  nonoptimal  commu¬ 
nication  codes. 

In  fact,  languages  evolve.  A  branch  of  linguistics  named  glot- 
tochronology  (the  chronology  of  languages)  suggests  the  rule 
of  thumb  that  languages  replace  about  20%  of  their  basic  vo¬ 
cabulary  every  KXK)  years  [44].  The  abovementioned  difficulty 
of  changing  the  communication  code  is  not  in  the  replacement 
of  old  signals  by  new  ones,  but  in  the  assignment  of  different 
meanings  to  old  signals  and  vice  verso.  Of  course,  this  would 
not  be  an  issue  if  the  evolutionary  language  game  could  lead 
the  population  to  the  optimal  code  (a  perfectly  compositional 
code,  in  our  case);  our  simulations  have  shown  that  it  always 
gets  stuck  in  one  of  the  local  maxima  that  plague  the  search 
space.  To  point  out  this  difficulty  was,  in  fact,  the  main  goal  of 
the  present  contribution. 

Our  view  of  compositionality  as  the  evolutionary  .stage 
following  the  settlement  of  simpler,  unstructured  communica¬ 
tion  codes,  and  the  search  for  a  continuous  path  connecting 
these  two  stages,  led  us  to  the  same  type  of  difficulties  re¬ 
searchers  working  on  a  similarly  elusive  problem — the  origin 
of  life — have  been  struggling  with  for  more  than  three  decades 
[39].  For  example,  although  the  coordinated  work  of  distinct 
genes  is  germane  to  the  emergence  of  cells,  it  is  still  not  clear 
how  such  an  assemblage  could  be  formed  and  maintained 
starting  from  selfish  genes  (see  [45]  for  a  review).  In  that 
sense,  by  exposing  the  obstacles  to  explain  compositionality 
from  an  evolutionary  perspective,  our  work  follows  the  .same 
research  vein  that  lead  to  the  present  understanding  of  prebiotic 
evolution. 

The  solution  we  put  forward  to  this  conundrum  is  a  con¬ 
servative  one — we  cannot  explain  the  emergence  of  the  entire 
meaning-signal  mapping  that  displays  the  required  composi¬ 
tional  property  via  natural  selection,  but  it  is  likely  that  the  map¬ 
ping  was  formed  gradually  with  the  addition  of  one  meaning 
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at  each  time.  This  gradual  procedure,  that  we  term  incremental 
meaning  creation,  leads  indeed  to  fully  compositional  codes  (see 
Fig.  8).  It  would  be  interesting  to  verify  whether  alternative, 
less  conservative  solutions  such  as  the  spatial  localization  of  the 
agents,  less  than  perfect  metrics  in  meaning  space,  or  the  struc¬ 
turing  of  the  population  by  age  could  lead  to  the  dissolution  of 
the  language  contract  and  so  open  an  evolutionary  pathway  to 
more  efficient  communication  codes. 
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Abstract 

Stnictun'cl  meaning-signal  mappings,  i.e.,  mappings  that  preserve  neighborhood  relationships  by  associating  similar  signals  with 
similar  meanings,  are  advantageous  in  an  environment  where  signals  are  corrupted  by  noise  and  sub-optimal  meaning  inferences 
are  rewarded  as  well.  The  evolution  of  these  mappings,  however,  cannot  be  explained  within  a  tra<litional  language  evolutionary 
game  scenario  in  which  individuals  meet  randomly  because  the  evolutionary  dynamics  is  trapped  in  local  maxima  that  do  not 
reflect  the  structure  of  the  meaning  and  signals  spaces.  Here  we  use  a  simple  game  theoretical  model  to  show  analytically  that 
when  individuals  axiopting  the  same  communication  code  meet  more  frequently  than  individuals  using  different  codes  a  result  of 
the  spatial  organization  of  the  population  -  then  advantageous  linguistic  innovations  can  spread  and  take  over  the  population.  In 
addition,  we  report  results  of  simulations  in  which  an  individual  can  communicate  only  with  its  K  nearest  neighbors  and  show  that 
the  probability  that  the  lineage  of  a  mutant  that  uses  a  more  efficient  communication  code  becomes  fixed  decreases  exponentially 
with  increasing  K.  These  findings  support  the  mother  tongue  hypothe.sis  that  human  language  evolved  as  a  eommunication  system 
used  among  kin.  especially  between  mothers  and  offspring. 


Key  word.s:  Evolution  of  coiiiinuiiication;  Population  dynamics;  Inverse  density  dependence;  Evolutionary  games 


1.  Introduction 

The  notion  that  words  compete  and  languages  evolve  in 
analogy  to  individuals  and  populations  was  already  famil¬ 
iar  in  the  nineteenth  century  as  expressed  in  this  quota¬ 
tion  by  the  famous  Darwin  contemporary  philologist  Max 
Miiller.  “A  struggle  for  life  is  constantly  going  on  amongst 
the  words  and  grammatical  forms  in  each  language.  The 
better,  the  shorter,  the  easier  forms  arc  constantly  gaining 
the  upper  hand,  and  they  otve  their  success  to  their  own  in¬ 
herent  virtue”  (Radiek  ,  2002).  A  more  suitable  analog  to 
language,  however,  is  that  of  a  parasitic  species  since  lan¬ 
guage  does  not  exist  without  speakers,  just  like  parasites 
do  not  exist  without  hosts  (Mufwene  .  2001).  In  fact,  the 
propagation  of  linguistic  innovations  through  a  population 
depends  solely  on  the  interaction  between  individuals  and, 
as  we  will  show  here,  the  meeting  practices  of  the  speakers 
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can  hamper  or  facilitate  the  .spread  of  iiew^  word.s  or  gram¬ 
matical  forms,  regardless  of  their  worth. 

The  debate  on  language  evolution  Inis  centered  mainly  on 
the  apparent  gap  between  animal  communication  systtuiis 
and  human  language  (see,  e.g.,  Pinker  and  Bloom  (1990)). 
In  fact,  (non-human)  animals  use  non-syntactic  or  holistic 
communication  codes,  i.e..  signals  refer  to  whole  situations, 
in  contrast  to  human  language  which  is  characterized  by 
signals  formed  by  discrete  components  that  have  their  own 
meaning.  As  pointed  out  b}  Deacon  (1997).  no  “.simple’' 
language  which  uses  some  elementary  form  of  syntax  or 
words  combination  has  ever  been  found  either  in  humans 
or  in  animals  (sec,  however.  Gordon  (2004)  for  a  possi¬ 
ble  exception  -  the  puzzling  language  of  the.  Piraha  people 
which  lacks  subordinate  clauses  as  well  as  words  a.ssociated 
with  time,  color  and  numbers).  This  discontinuity  is  behind 
the  notion  of  a  ‘‘language  organ”  exclusive  of  the  human 
species  which  was  originally  designed  to  carry  out  combi¬ 
natorial  calculations  (Chomsky  ,  1972;  Fodor  ,  1983).  Ac¬ 
cording  to  Chomsky  (1972),  language  is  an  example  of  true 
emergence  -  the  appearance  of  a  qualitatively  different  phe- 
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noineuon  at  a  specific  stage  of  complexity  of  organization. 
The  burden  of  identifying  the  selective  pressures  account¬ 
able  for  the  emergence  of  syntax  falls  to  those  who  hold 
the  biological-oriented  perspective  that  human  language 
has  evolved  gradually  from  a  simpler  precursor  -  a  proto¬ 
language  -  by  means  of  the  usual  natural  selection  process. 
The  demands  of  the  social  life  of  early  hominids  have  been 
pointed  as  a  probable  source  of  selective  pressures  for  the 
evolution  of  syntactic  communication  (Dunbar  ,  1996). 

Even  the  evolution  of  simple  holistic  communication, 
which  can  be  viewed  as  a  one-to-one  mapping  between 
meanings  and  signals,  has  to  confront  some  fundamental 
difficulties  (Dawkins  and  Krebs  ,  1978;  Fitch  ,  2004).  In 
fact,  from  the  j^erspective  of  the  signaller,  passing  u.seful 
information  to  another  individual  is  an  altruistic  act  and 
so  its  maintenance  in  nature  is  problematic,  whereas  from 
the  receiver  viewpoint  deciding  whether  a  signal  is  honest 
(in  the  sense  of  conveying  accurate  information)  or  not  is 
a  difficult  problem,  the  solution  of  which  is  thought  to  de¬ 
pend  on  the  cost  paid  by  the  signaller  to  emit  the  signal 
(Zahavi ,  1993).  This  is  the  essence  of  the  “handicap  princi¬ 
ple’  ,  namely,  honest  signals  are  retained  only  when  the  sig¬ 
naller  pays  a  high  cost  when  emitting  them  (Zahavi ,  1975). 
The  relevance  of  this  principle  to  the  evolution  of  commu¬ 
nication.  however,  has  been  defied  by  Noble  (2000)  who 
showed  that  a  iiece.ssary  condition  for  efficient  communica¬ 
tion  to  evolve  is  that  both  sender  and  receiver  are  benefited 
equally  in  the  case  of  mutual  understanding.  By  an  effi¬ 
cient  communication  code  we  mean  a  Saussurean  commu¬ 
nication  system  that  maps  meanings  unambiguously  onto 
signals  and  then  back  into  the  original  meanings  (Hurford 
,  1989:  Oliphant  ,  1996). 

Rather  than  focusing  on  the  evolution  of  Saussurean 
communication  (see  Hurford  (1989);  MacLennan  (1991); 
Nowak  and  Krakauer  (1999);  Nowak  et  al  (1999);  Oliphant 
(1996):  Noble  (2000)  for  work  on  this  line),  in  this  paper 
we  admit  that  one  such  a  code  is  already  established  in  the 
population  and  study  the  conditions  under  which  a  more 
robust  communication  system  can  take  over.  The  breaking 
of  the  degeneracy  between  distinct  Saussurean  codes  -  es¬ 
sentially  the  n\/  (n  —  m)\  different  manners  to  associate  m 
meanings  to  n  >  m  signals  -  is  achieved  by  introducing 
errors  in  the  perception  of  signals  as  well  as  by  reward¬ 
ing  the  inference  of  meanings  close  to  the  intended  ones 
(Nowak  and  Krakauer  ,  1999;  Zuidema  ,  2003:  Fontanari 
and  Perlovsky  .  2007).  This  amounts  to  considering  struc¬ 
tured  meaning-signal  mappings  in  which  neighborhood  re¬ 
lationships  are  preserved  (see  Sect.  2). 

We  take  up  the  evolutionary  language  game  approach 
(Nowak  et  al  ,  1999)  to  study  the  competition  between 
two  comniunication  codes  or  strategies:  a  perfectly  struc¬ 
tured  meaning-signal  mapping  (strategy  1)  and  a  random 
meaning-signal  mapping  (strategy  2).  This  study  is  pri¬ 
marily  motivated  by  the  failure  of  the  traditional  language 
game  scenario  to  explain  the  evolution  of  structured  com¬ 
munication  codes  starting  from  a  population  composed  of 
individuals  who  use  distinct  communication  codes,  so  the 


chance  that  a  signal  emitted  by  an  individual  is  correctly 
interpreted  by  another  individual  is  I /in  (Fontanari  and 
Perlovsky  ,  2007).  This  is  so  because  the  evolutionary  dy¬ 
namics  is  very  likely  to  get  trapped  in  the  local  maxima  - 
the  random  meaning-signal  mappings  -  and  once  a  commu¬ 
nication  code  is  fixed  in  the  population  it  cannot  be  changed 
even  if  a  small  fraction  of  the  population  adopts  the  more 
efficient  structured  code.  This  is  essentially  the  Allee  ef¬ 
fect  (Allee  .  1931)  of  population  dynamics  that  asserts  that 
intraspecific  cooperation  might  lead  to  inverse,  density  de¬ 
pendence,  resulting  in  the  extinction  of  some  (social)  ani¬ 
mal  specicvS  when  their  population  size  becomes  small.  Of 
course,  this  effect  is  germane  to  the  outcome  of  biological 
invasions  involving  such  species. 

Instead  of  using  the  genetic  algorithm  to  simulate  the 
population  dynamics,  here  we  use  an  analytical  approach 
based  on  the  game  theoretical  formulation  of  Eshe.1  and 
Cavalli-Sforza  (1982),  which  allows  us  to  derive  explicit 
conditions  for  the  minimum  size  of  the  population  that 
adopts  a  structured  code  to  invade  an  t^stablished  popula¬ 
tion  of  individuals  adopting  a  sub-optimal  communication 
system.  In  particular,  we  show  that  useful  linguistic  innova¬ 
tions  can  spread  and  take  over  the  population  if  the  meet¬ 
ing  of  individuals  using  the  same  communication  .strategy 
is  more  likely  than  the  encounter  of  individuals  using  dif¬ 
ferent  strategies  -  a  natural  consequence  of  imposing  a  spa¬ 
tial  structure  to  the  population  since  individuals  are  more 
likely  to  communicate  with  those  close  to  them  than  with 
those  farther  away.  Additional  support  to  this  finding  is  ob¬ 
tained  through  the  explicit  simulation  of  a  spatially  orga¬ 
nized  population  in  which  the  individuals  can  interact  with 
their  A"-nearest  neighbors  only.  Our  findings  support  the 
“mother  tongue”  assumption  that  human  language  evolved 
as  a  communication  system  used  among  kin,  especially  be¬ 
tween  parents  and  their  offspring  (Fitch  ,  2004). 

2.  Meaning-signal  mapping 

Here  we  adopt  the  view  that  language  is  a  mapping  be¬ 
tween  meanings  (or  objects)  and  signals.  In  most  previ¬ 
ous  studies  of  evolutionary  language  games  this  mapping  is 
structureless  or  random:  the  metrics  (if  any)  of  the  meaning 
and  signal  spaces  play  no  role  in  the  properties  of  the  map¬ 
ping  and  hence  on  the  nature  of  the  evolved  communica¬ 
tion  codes  (Hurford  ,  1989;  MacLennan  .  1991;  Nowak  and 
Krakauer  ,  1999;  Nowak  et  al ,  1999;  Oliphant .  1996:  Noble 
,  2000).  This  contrasts  with  a  more  recent  approach  that 
put  emphasis  on  the  properties  of  the  meaning-signal  inap)- 
ping  and,  in  particular,  focus  on  structured  mappings  that 
preserve  neighborhood  relationships,  i.e.,  nearby  meanings 
in  the  meaning  space  are  likely  to  be  associated  to  nearby 
signals  in  signal  space  (Smith  et  al.  ,  2003:  Zuidema  ,  2003: 
Brighton  et  al.  ,  2005;  Fontanari  and  Perlovsky  ,  2007). 

This  notion  of  structured  mappings  seems  contradictory 
to  the  well-established  fact  that  the  relation  betwex^n  a  word 
(signal)  and  its  meaning  is  arbitrary  (Pettito  ,  1994).  In 
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fact,  as  pointed  out  by  Pinker  (1994)  “babies  should  not, 
and  apparently  do  not,  expect  cattle  to  mean  something 
similar  to  battle,  or  singing  to  be  like  stinging,  or  coats  to 
resemble  goats'.  On  the  other  hand,  a  code  that  preserves 
neighborhood  relationships  is  clearly  advantageous  in  an 
environment  where  signals  are  likely  to  be  altered  by  noise. 
Consider  for  instance  the  Vervet  monkey  alarm  calls  (Sey- 
farth  et  al.  ,  1980):  misinterpreting  a  snake  alarm  for  a  leop¬ 
ard  one,  and  hence  running  to  a  tree  instead  of  standing  up 
and  looking  in  the  grass,  is  clearly  much  better  than  mis¬ 
interpreting  it  for  an  eagle  call.  In  addition,  sentences  like 
John  walked  and  Mary  walked  have  parts  of  their  semantic 
representation  in  common  (someone  performed  the  same 
act  in  the  past)  and  so  the  meaning  of  these  sentences  must 
be  close  in  the  meaning  space.  Since  both  sentences  contain 
the  word  walked  they  must  necessarily  be  close  in  signal 
space  as  well  (Smith  et  al.  ,  2003;  Brighton  et  al.  ,  2005).  It 
should  be  noted  that  the  very  notion  of  meaning  similarity 
ill  contraposition  to  meaning  identity  is  a  highly  controver- 
.sial  i.ssue  in  cognitive  science  (see,  e.g.,  Churchlaiid  (1998); 
Fodor  and  Lepore  (1999);  Abbott  (2000).  However,  within 
a  connectionist  perspective  in  which  meanings  are  neural 
activation  i)at terns,  the  concept  of  meaning  similarity  fol¬ 
lows  naturally.  In  this  contribution,  we  take  the  stand  that 
ill  simple  (nonhuman)  conimunication  structured  meaning- 
signal  mappings  are  likely  to  be  relevant  even  at  the  ele- 
imnitary  level  of  the  object-word  pairing,  whereas  in  human 
language  these  mappings  may  play  a  role  at  the  meaning- 
sentence  level  only. 

We  represent  the  signals  (sentences  or  words)  as  well  as 
the  meanings  by  single  symbols  (labels)  -  only  the  “dis¬ 
tance'*  between  these  entities  will  reflect  the  complex  inner 
structure  of  the  signal  and  meaning  spaces.  For  instance, 
suppose  there  are  only  two  words  that  we  represent,  with¬ 
out  lack  of  generality,  by  0  and  1  so  that  a  binary  sequence 
or,  equivalently,  its  decimal  representation  stands  for  any 
sentence  in  this  language.  Here  the  relevant  distance  be¬ 
tween  two  such  sentences  is  the  Hamming  distance  rather 
than,  e.g.,  the  result  of  the  subtraction  between  their  la¬ 
beling  integers.  This  notion,  of  course,  generalizes  trivially 
to  the  case  where  the  sentences  are  composed  of  more  than 
two  types  of  words.  As  pointed  out  before,  the  representa¬ 
tion  of  meanings  is  a  much  vaguer  issue,  but  within  a  coii- 
iiectionist  stand  we  can  think  of  meanings  also  as  patterns 
of  Is  and  Os  representing  the  arrangement  of  active  and  in¬ 
active  neurons  in  the  neural  region  activated  by  the  signal. 

For  simplicity,  in  this  paper  we  consider  the  case  where 
both  signals  and  meanings  are  represented  by  integer  num¬ 
bers  and  the  relevant  distance  in  both  signal  and  meaning 
space  is  the  result  of  the  usual  subtraction  between  inte¬ 
gers.  In  addition,  we  consider  the  case  where  the  number 
of  signals  equals  the  nnniber  of  meanings  m  =  n.  Figure 
1  illustrates  a  structured  meaning-signal  mapping  in  the 
citse  of  n  =  5  signals.  For  n  signals  there  are  only  2ti  struc¬ 
tured  mappings  out  of  the  n!  possible  mappings.  A  ran¬ 
dom  mapping  is  obtained  simply  by  assigning  meanings 
to  signals  randomly.  A  quantitative  measure  of  the  struc- 
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Fig.  1.  Illustration  of  a  meaning-signal  mapping  for  n  =  5.  The  inlc^ 
gers  here  are  viewed  as  labels  for  complex  entities  (e.g.,  .scntciice.s). 
The  metric  of  the  signal  space  is  such  that  signal  3  i.s  one  unit  distant 
from  signals  2  and  4.  This  space  has  periodic  boundary  conditions 
so  that  signal  5  is  one  unit  distant  from  signal  1.  This  metric  aj)- 
plies  to  the  meaning  space  as  well.  Becaii.se  nearby  meanings  in  the 
meaning  space  are  associated  to  nearby  signals  in  signal  spjace  thi.s 
is  a  structured  mapping. 


ture  of  a  mapping  is  given  by  the  degree  to  which  the  dis¬ 
tances  between  all  the  possible  pairs  of  meaning.s  correlates 
with  the  distance  between  their  corresponding  pairs  of  sig¬ 
nals,  a  quantity  known  as  Pearson’s  correlation  coefficient 
(Brighton  et  al.  ,  2005).  Since  here  we  will  focus  on  the 
competition  between  two  comimmicatioii  strategies  given 
a  priori  -  structured  and  random  mapping  -  we  will  not 
consider  these  partially  structured  maj^pings  which  play  a 
fundamental  role  when  the  issue  is  the  emergence  of  struc¬ 
tured  mappings  from  an  initial  population  of  random  map¬ 
pings  (Fontanari  and  Perlovsky  ,  2007). 

A  mapping  that  preserves  the  topology  of  the  mean¬ 
ing  and  signal  spaces  was  termed  a  compositional  mapping 
in  previous  works  (Smith  et  al.  ,  2003;  Zuidema  .  2003; 
Brighton  et  al.  ,  2005;  Fontanari  and  Perlovsky  ,  2007). 
Here  we  use  the  term  structured  mapping  instead,  to  avoid 
confusion  with  the  well-established  concept  of  compositioii- 
ality  which  is  defined  as  the  property  that  the  meaning  of 
a  complex  expression  is  determined  by  the  meanings  of  its 
parts  and  the  rules  used  to  combine  them.  In  fact.  Fodor 
and  Lepore  (1999)  even  claim  that  the  notion  of  meaning 
similarity  excludes  the  possibility  of  compositioiiality  {.see, 
however,  Abbott  (2000)).  In  an  artificial  scenario  in  which 
there  is  a  prescription  to  derive  the  meaning  of  the  w'hole 
given  the  meaning  of  the  elementary  parts,  however,  there 
is  direct  connection  between  structured  and  compositional 
meaning-signal  mappings  since  in  this  case  the  di.stance  be¬ 
tween  any  two  composite  meanings  could  be  inferred  by 
comparing  their  components  and,  consequently,  by  intro¬ 
ducing  a  metric  in  the  meaning  space. 
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3.  Strategy  payoffs 


To  explore  the  structure  of  the  meaning-signal  mapping 
(sec  Fig.  1)  we  must  admit  the  possibility  of  errors  in  the 
perception  of  the  signals  as  well  as  the  alternative  of  re¬ 
warding  the  inference  of  meanings  close  but  not  equal  to 
the  meaning  intended  by  the  signaller. 

It  is  reasonable  to  assume  that  in  the  earlier  stages  of  the 
evolution  of  communication  the  signals  were  likely  to  be 
noisy  and  so  they  could  be  easily  mistaken  for  each  other. 
The  relevance  of  the  structure  of  the  signal  space  becomes 
apparent  when  we  note  that  the  closer  two  signals  arc,  the 
higher  the  chances  that  they  are  mistaken  for  each  other. 
In  particular,  here  we  will  consider  the  simple  case  in  which 
there  is  a  nonzero  probability  e  6  [0,1/2]  that  a  signal, 
say  signal  j,  be  mistaken  for  one  of  its  nearest  neighbors 
j  —  I  ov  j  -h  I .  So.  in  the  example  of  Fig.  1,  signal  5  can 
be  mistaken  for  signal  4  with  probability  e/2  or  for  signal 
1  with  probability  e/2.  Of  course,  the  probability  that  a 
signal  is  not  corrupted  by  noise  is  1  —  e. 

The  individuals  in  the  population  can  adopt  either  strat¬ 
egy  1  (structured  meaning-signal  mapping)  or  strateg\^  2 
(random  meaning-signal  mapping).  The  interaction  -  a 
communication  event  -  between  a  pair  of  individuals,  say 
individuals  /  and  J,  comprises  two  stages:  first  /  plays  the 
role  of  signaller  (so  J  is  the  receiver)  and  then  J  and  J 
exchange  roles.  Both  individuals  receive  the  same  payoff 
vahi(\  In  particular,  we  assume  that  both  signaller  and 
receiver  are  rewarded  with  1/2  unity  of  payoff  whenever 
the  receiver  interprets  correctly  the  moaning  of  the  emit¬ 
ted  signal.  In  addition,  both  agents  are  rewarded  with  r/2 
unity  of  payoff,  where  r  G  [0, 1],  if  the  inferred  meaning  is 
one  of  the  nearest  neighbors  of  the  intended  meaning.  We 
note  that  giving  value  to  decisions  which  are  not  the  best 
ones  is  a  common  assumption  in  decision  and  game  theory 
fFudenberg  .  1991)  and.  it  seems  to  be  consistent  with 
what  is  actually  observed  in  nature  since,  as  illustrated 
by  the  Verve t  monkey  alarm  calls  example,  not  every 
misinterpretation  is  equally  harmful  (see,  e.g.,  Zuidema 
^2003)).  The  factors  1/2  appear  here  because,  as  pointed 
out  before,  a  communication  event  comprises  two  stages 
in  which  the  individuals  interchange  the  roles  of  signaller 
and  receiver.  So.  both  individuals  gain  1  unity  of  payoff  in 
case  communication  was  successful  in  both  stages. 

Next  we  calculate  the  average  payoff  accrued  to  a  pair 
of  individuals  during  a  communication  event.  First,  let  us 
consider  the  interaction  between  two  individuals  who  both 
have  strategy  1.  The  average  payoff  of  the  individual  play¬ 
ing  the  signaller  is  (1  —  c)  x  1/2  -h  c  x  r/2  which,  by  sym¬ 
metry,  happens  to  be  the  same  average  payoff  it  receives 
when  playing  the  receiver  role.  Hence 

Fii  =  1  -6(1  -r).  (1) 

In  the  case  both  individuals  have  strategy'  2,  the  average 
payoff  of  the  signaller  is  (1  —  c)  x  1/2  -t-  c  x  2/  (n  —  1)  x  r/2 
where  the  factor  2/  (n  —  1)  accounts  for  the  fact  that  the 


reward  r/2  is  obtained  only  if  the  inferred  meaning  is  one  of 
the  two  neighbors  of  the  correct  meaning.  Hence  the  aver¬ 
age  payoff  accrued  to  both  individuals  in  a  eommunication 
event  is 

F22  =  1  -  f  +  -^r.  (2) 

This  reasoning  is  valid  for  n  >  2  only:  for  n  =  2  each  mean¬ 
ing  has  a  single  neighbor  and  so  the  correct  expression  is 
F22  —  1—6(1  —  r).  Finally,  in  the  case  the  individuals  have 
different  strategies  the  probability  the  receiver  infers  cor¬ 
rectly  the  signaller  intentions  is  simply  1  /n  and  the  prob¬ 
ability  that  it  infers  a  meaning  which  is  a  neighbor  of  the 
intended  one  is  2/  (n  —  1).  The  average  payoff  of  this  com¬ 
munication  event  is  then 


and  F21  =  Fi2- 

For  71  >  3  we  have  Fn  >  F22  where  the  equality  hoULs 
for  77  =  3  as  well  as  for  the  trivial  cases  6  =  0  or  r  =  0. 
In  addition,  F\i  >  F\2  for  n  >  2.  We  will  show  in  the 
following  section  that,  except  for  77  =  4  and  e  close  to  its 
maximum  value  1/2,  wc  have  F22  >  ^12-  These  inequalities 
arc  important  to  determine  the  local  stability  of  the  two 
strategics. 

4.  Population  dynamics 

As  pointed  by  Ferdinand  dc  Saussure  '‘language  is  not 
complete  in  any  speaker;  it  exists  only  within  a  collectiv¬ 
ity...  only  by  virtue  of  a  sort  of  contract  signed  by  members 
of  a  community”  (Saussure  ,  1966).  Tran.slated  into  the  bi¬ 
ological  jargon,  this  assertion  means  that  language  is  not 
the  property  of  an  individual,  but  the  extended  phenotype 
of  a  population  (Nowak  ct  al  ,  2002).  So  a  suitable  ap¬ 
proach  to  language  evolution  must  take  into  account  the 
population  dynamics.  In  wdiat  follows  we  build  on  the  game 
theoretical  formulation  of  Eshcl  and  Cnvalli-Sforza  (1982) 
to  investigate  analytically  the  evolution  of  structured  com¬ 
munication  codes. 

Let  X  6  [0. 1]  be  the  proportion  of  individuals  in  a  pop¬ 
ulation  of  infinite  size  that  use  the  structured  communi¬ 
cation  code  (strategy  1).  To  calculate  the  expected  payoff 
of  individuals  adopting  a  particular  strategy  wc  need  to 
make  some  assumption  about  the  frequency  of  encounters 
between  any  two  individuals.  Let  u,j  with  =  1.2  be  the 
probability  that  an  individual  using  strateg>'  i  encounters 
an  individual  that  uses  strategy  j.  Since  the  game  rules  are 
such  that  an  individual  must  encounter  a  partner  to  inter¬ 
act  with,  we  have  Uii  +  u,2  =  1  for  i  =  1,2.  In  addition, 
since  the  average  number  of  encounters  between  individu¬ 
als  using  different  strategies  can  be  written  either  as  xui2 
or  (1  —  t)u2i  we  have  the  equality  U12/U21  =  (1  -  x)  /x. 
Hence  a  single  encounter  probability,  say  un.  determines 
all  other  encounter  probabilities:  1112  =  1  -  tin.  7721  = 
x(l  —  uii)  /  (1  —  t),  and  U22  =  (1  -  2x  -h  xtin)  /  (1  “  •^‘)- 
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Ill  the  case  encounters  are  random  and  independent  of  the 
communication  code  we  have  uu  =  x  so  that  Ui2  =  U22  — 
1  —  T  and  U21  =  X. 

The  expected  payoff  for  individuals  using  strategy  i  = 
1,2  is  Fi{x)  =  UiiFii  H-  Ut2Fi2  or,  explicitly, 


Fi  (x)  =  Fi2 -r  (Fu  -  F12)  till  (j:)  (4) 

F2  (X)  =  F22  +  (Fi2  -  F22)  [1  -  till  (x)] .  (5) 

A  simple  deterministic  population  dynamics  model  that  de¬ 
scribes  the  competition  of  the  two  strategies  is  obtained  by 
as.suming  that  the  proportion  of  individuals  using  strategy 
1  in  generation  t  -h  1  is  proportional  to  the  relative  payoff 
of  that  strategy  in  generation  t,  i.e., 


•ff+i 


_ ^tF\  (xt) _ 

XfFi  (xt)  +  (1  -  Xt)  F2  (xt) 


=  fixt)  , 


(6) 


which  essc’iitially  implies  that  mastery  of  a  public  commu¬ 
nication  system  adds  to  the  reproductive  potential  of  the 
individuals  (Hurford  ,  1989).  This  model  is  equivalent  to 
the  standard  genetic  algorithm  (Mitehell ,  1996)  procedure 
with  an  infinite  population  size.  As  expected  x  =  0  and 
.r  =  1  are  always  fixed  points  of  the  recursion  equation  (6). 
The  issue  is  to  determine  their  stability  and,  in  the  case 
that  both  fixed  points  are  stable,  their  basins  of  attraction. 
As  usual,  the  condition  for  the  stability  of  a  fixed  point  x* 
is  simply  /'  (x*)  <  1  (see,  e.g.,  Maynard  Smith  (1982)). 


4.1.  Random  encounters 


Fig.  2.  Minimum  fraction  of  individuals  using  structured  codes  nec¬ 
essary  for  this  strategy  to  dominate  tlie  population  in  the  ca.se  of 
random  encounters  for  e  =  0.5  and  n  as  indicated  in  the  figure.  The 
deished  curve  is  the  result  for  n  — ►  oo. 

This  quantity  is  the  minimum  initial  fraction  of  individuals 
using  strategy  1  above  which  this  strategy  dominates  the 
population.  Expression  (8)  corrects  the  estimate  given  in 
Fontanari  and  Pcrlovsky  (2007).  In  Fig.  2  we  illu.stratc 
the  dependence  of  x^  on  the  parameters  of  the  model.  As 
already  pointed  out,  since  for  7i  =  4  the  Saussureaii  fixed 
point  is  unstable  in  the  range  r  >  |  [1  “  1/4  (1  —  f)]  we 
have  x„  =  0  in  this  regime.  For  77  —  oc  we  find  = 
(2  +  er/(l 


This  is  the  typical  scenario  used  in  most  computational 
models  for  the  evolution  of  communication  (Hurford  ,  1989; 
MacLennan  ,  1991:  Nowak  and  Krakauer  ,  1999)  and,  in 
])articular,  Fontanari  and  Pcrlovsky  (2007)  have  consid¬ 
ered  an  agent-based  simulation  aiming  at  exploring  the 
plausibility  of  the  emergence  of  structured  codes  in  a  ran¬ 
dom  encounter  situation.  As  already  mentioned,  random 
encounters  are  described  by  iin  =  x.  The  stability  condi¬ 
tion  of  the  fixed  point  x  =  0  associated  to  strategy  2  (ran¬ 
dom  meaning-signal  mapping),  namely,  /' (0)  <  1  yields 
F22  >  F\2  or,  more  explicitly, 


/•  < 


1 

n(  l 


(7) 


Since  r  <  1  and  6  <  1  /2  this  condition  is  violated  only  if  n  = 
4  and  e  >  1/4.  Similarly,  the  fixed  point  x  =  1,  associated  to 
stratcg>^  1  (structured  meaning-signal  mapping),  is  stable 
provided  /'  (1)  <  1,  that  leads  to  the  condition  Fn  >  F,2 
which  is  satisfied  for  tz  >  2  regardless  of  the  values  of  r  and 
f .  In  mOvSt  cases  (e.g.,  n  >  4)  the  fixed  points  x  =  0  and  x  = 
1  are  stable  and  so  there  is  an  inner  unstable  fixed  point 
.r„  that  delimits  the  basins  of  attractions  of  the  two  stable 
fixed  points.  It  is  given  by  the  condition  Fi  (x^)  =  F2  (x^) 
which  yields 


Fu~F,2 
F22  ~  Fi2 


-1 


(8) 


4.2.  Nonrandom  encounters 

Nonrandomness  of  encounters  are  usually  modeled  by 
imposing  some  spatial  structure  to  the  population  in  which 
the  individuals  are  fixed  to  lattice  sites  and  so  can  interact 
only  with  their  nearest  neighbors  or  then  isolated  in  groups 
(see,  e.g.,  Oliphant  (1996);  Noble  (2000);  Caiigelo.si  (2001) 
for  this  type  of  approach  within  the  evolution  of  coiiinuini- 
cation  context).  An  alternative  formulation  of  nonrandom 
encounters  which  keeps  the  mathematics  simple  is  to  as¬ 
sume  that  the  frequency  of  meetings  between  individuals 
using  strategy  1  is 

Pii  =  (1  —  m)  x^  4-  mx  (9) 

where  m  G  [0,1]  is  the  aggregation  parameter  (Wright  , 
1921;  Eshel  and  Cavalli-Sforza ,  1982).  In  fact,  the  probabil¬ 
ity  that  an  individual  using  strategy  1  encounters  another 
of  its  kind  is  un  =  Fn/x  =  rti  4-  (1  —  tt?)  x,  from  which  we 
obtain  U22  =  m  4-  (1  —  m)  (1  —  x).  Hence  m  represents  the 
portion  of  the  population  that  meets  an  individual  of  the 
same  strategy,  whereas  the  fraction  1  —  ni  meets  randomly. 
The  situation  of  random  encounters  is  obtained  by  .setting 
?7?  =  0. 

Now  the  conditions  for  the  stability  of  the  fixed  points 
X  =  0  and  x  =  1  become  F22  >  ttiFn  -t-  (1  —ni)  F\2  and 
Fii  >  771 F22  4-  (1  —m)F\2,  re-spectively.  Since  Fn  >  F22 
and  Fii  >  F12  for  n  >  2  the  fixed  point  x  =  1  is  always  sta- 
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Fig.  3.  Phase  diagram  in  the  plane  (m,  r)  .showing  the  regions  where 
the  fixed  point  x  =  0  associated  to  the  random  meaning-signal 
mapping  is  unstable  (above  the  curves)  so  the  individuals  using  the 
structured  cominnnication  code  can  dominate  the  population  even 
when  their  initial  frequency  is  vani.shingly  small.  The  parameters  are 
n  10  and  f  =  0.1,  0.3,  0.5  as  indicated  in  the  figure. 


ble  regardless  of  rn.  The  situation  for  the  fixed  point  x  =  0, 
however,  changes  considerably,  as  illustrated  in  Fig.  3  that 
shows  the  regions  of  stability  of  this  fixed  point  in  the  plane 
(rn.  r).  Large  values  of  m  can,  as  expected,  destabilize  this 
fixed  point.  By  setting  the  parameters  so  as  to  ineiximize 
the  advantage  of  strategy  1,  i.e.,  r  =  1  and  e  =  1/2,  we 
find  that  the  stability  of  x  =  0  is  guaranteed  provided  that 
rn  <  m.,  with  =  1  —  in  {n  -  3)  /  (n^  -  4n  -I- 1)  for  n  > 
4.  Note  that  rn,,  6  [1/6, 1/2]  as  n  increases  from  5  to  oc. 

In  the  case  both  fixed  points  x  =  0  and  x  =  1  are  stable, 
the  inner  unstable  fixed  point  is  still  given  by  the  condition 
^1  (^‘u)  =  Fi  (Xu)  which  now  yields 


I  A  _  -^11  ~  ^12  \  A  Fji  -  Fi2 

1  —  m  \  F22  —  Fi2/  \  F22  —  F\2 


(10) 


Figure  4,  which  exhibits  the  dependence  of  the  threshold 
frecpieney  on  the  aggregation  parameter  m.  reinforces 
the  fact  that  x„  =  0  in  the  regions  of  the  space  of  para¬ 
meters  where  the  fixed  point  associated  with  the  random 
mapping  strategy  is  unstable. 


4.3.  Spatially  structured  populations 

111  support  to  the  findings  of  the  previous  section,  here 
we  report  results  of  agent-based  simulations  where  the  spa¬ 
tial  organization  of  the  population  is  explicitly  taken  into 
account.  In  particular,  we  assume  that  N  individuals  are 
placed  in  eciiiidistant  sites  on  a  ring  (one  individual  per 
site),  and  each  individual  can  interact  with  its  /\  th  nearest 
neighbors  only.  The  fully  connected  situation  (i.e.,  an  in¬ 
dividual  interacts  with  the  N  —  I  remaining  individuals  in 
the  population)  is  recovered  by  setting  K  =  (N  —  1)  /2.  As 
before,  the  individuals  can  use  either  strategy  1  or  strategy 
2  and  the  payoff  resulting  from  their  interactions  are  given 
by  Eqs.  (l)-(3).  We  recall  that  each  interaction  comprises 


Fig.  4.  Minimum  fraction  of  individuals  using  striictun'd  communi¬ 
cation  neces4>ary  for  this  strategy  to  dominate  the  population  in  the 
case  of  nonrandom  encounters  for  c  =  r  =  0.5  and  (solid  curves  from 
bottom  to  top)  n  =  4,  5,  and  10.  The  dashed  curve  is  the  result  for 
n  — *  00. 


two  events  in  which  the  individuals  exchange  roles  jis  sig¬ 
naler  and  receiver. 

The  fitness  of  an  individual  is  evaluated  by  computing 
the  total  payoff  it  obtains  when  interacting  with  its  K  near¬ 
est  neighbors.  Once  the  fitne,ss  of  all  individuals  are  known, 
wc  compute  the  total  fitness  of  the  population  and  then 
the  relative  fitness  of  each  individual.  The  next  .step  is  to 
choose  a  single  individual,  say  /,  for  replication  with  prob¬ 
ability  proportional  to  its  relative  fitness.  The  copy  of  I 
then  replaces  one  of  the  2K  -f  1  individuals  that  belong  to 
the  neighborhood  of  influence  of  I  and  I  itself.  The  choice 
of  the  individual  to  be  discarded  is  done  randomly  with¬ 
out  regard  to  their  fitness  values.  This  selection  procedure 
is  essentially  Moran  model  of  population  genetics  (Ewens 
,  2004),  except  for  the  fact  that  the  offspring  is  placed  in 
the  region  of  influence  of  its  parent,  resulting  in  a  situ¬ 
ation  where  neighbors  are  likely  to  be  genetically  similar 
(Oliphant  .  1996).  The  repetition  of  this  procedure  for  iV 
times  defines  the  time  unit  (one  generation)  of  the  dynam¬ 
ics.  At  the  initial  generation  (f  =  0),  all  individuals  adopt 
strategy'  2,  cxeept  for  a  single  mutant  that  uses  strategy  1. 

Figure  5  illustrates  the  time  evolution  of  the  fraction  of 
individuals  that  use  strategy  1  for  four  independent  runs 
in  the  case  an  individual  can  interact  only  with  its  two  first 
nearest  neighbors  {K  =  1).  We  should  note  that  these  are 
not  typical  runs,  since  in  a  typical  run  the  mutant  lineage 
goes  extinct  in  the  very  first  generations.  This  figure  high¬ 
lights  the  stochastic  character  of  the  dynamics  -  the  same 
initial  setting  can  lead  to  very  different  outcomes,  namely, 
the  fixation  or  the  extinction  of  the  mutant  lineage.  To 
make  this  observation  quantitative  we  record  the  out(‘ome 
of  10^  independent  runs  and  present  in  Fig.  6  the.  fraction 
of  them  {Ps)  that  resulted  in  the  fixation  of  the  inutaiit 
lineage,  i.e.,  of  the  structured  communication  code. 

The  most  relevant  information  revealed  by  Fig.  6  is  that 
the  probability  of  invasion  decreases  ex{)onentially  with  in¬ 
creasing  K.  In  particular,  for  the  data  exhibited  in  the  fig- 
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Fig.  5.  Fraction  of  individuals  that  use*  strategy  1  in  four  population 
samples  of  A'  =  lOl  individuals  placed  in  equidistant  sites  on  a  ring. 
Each  individual  caii  interact  only  with  its  first  nearest  neighbors 
{K  —  1).  The  initial  condition  \s  xq  =  \/N  and  the  parameters  are 
r  =  1.  e  =  0.5,  and  n  =  10. 


Fig.  7.  Per  capita  growth  rate  of  individuals  using  the  sirnctured 
communication  code  as  function  of  the  fraction  of  the  population 
which  adopts  that  strategy  for  m  =  0  and  m  =  0.9  as  indicated  in 
the  figure.  The  parameters  are  n  =  10  and  t  —  r  =  0.5. 


5.  Discussion 


Fig.  (j.  Probability  that  the  lineage  of  a  single  mutant  that  uses  stat- 
egy  1  overtakes  the  resident  population  in  a  chain  with  ,V  =  101  in¬ 
dividuals,  where  each  individual  interacts  with  its  2K  nearest  neigh¬ 
bors.  The  parameters  are  e  =  0.5,  n  =  10  and  (top  to  bottom) 
f\  =  1.2 . 7.  The  lines  are  guides  to  the  eyes. 


lire  we  find  Pv  =  aexp(-6A')  where  a  ^  0.03  -f  0.47r  — 
0.23r^  is  ail  increasing  function  of  r  €  [0, 1]  whereas  b  ^ 
1.2(1  -  r )  -f  0. 46r ^  decreases  with  increasing  r  G  [0,1],  The 
results  for  different  values  of  the  noise  parameter  f  exhibit 
the  same  qualitative  behavior.  In  addition,  in  the  range  of 
K  considered  here,  we  have  found  that  the  fixation  prob¬ 
ability  P,  is  practically  insensitive  to  the  population  size 
N.  Hence  in  agreement  with  the  findings  of  the  previous 
section,  the  aggregation  of  individuals  using  the  same  com¬ 
munication  system  is  ultimately  the  mechanism  that  lead 
to  the  spread  of  advantageous  linguistic  innovations  in  a 
population. 


Understanding  how  innovations  that  increase  the  expres¬ 
sive  power  of  individuals  can  spread  through  a  population 
and  eventually  become  fixed  is  the  essence  of  any  evolution¬ 
ary  explanation  to  language  evolution.  How^ever,  the  finding 
that  the  adoption  of  any  particular  trait  (a  structured  com¬ 
munication  code,  in  the  present  context)  is  better  for  a  pop¬ 
ulation,  in  the  sense  it  yields  an  higher  overall  payoff,  is  no 
guarantee  that  such  trait  will  actually  spread  in  the  popula¬ 
tion.  As  pointed  out  by  Cavalli-Sforza  and  Feldman  (1983), 
since  communication  takes  place  between  two  or  more  indi¬ 
viduals,  the  .selective  process  is  frequency  dependent  and  so 
communication  cannot  evolve  in  a  simple  scenario  in  which 
the  individuals  meet  randomly.  Those  authors  have  argued 
that  such  obstacle  can  be  removed,  however,  if  the  com¬ 
munication  events  occur  predominantly  within  the  family 
or  among  close  relatives.  Interestingly,  this  same  idea  r(*ap- 
peared  about  ten  years  later  as  the  “mother  tongue”  sce¬ 
nario,  which  purports  that  language  evolved  as  a  commimi- 
cation  system  used  among  kin,  especially  between  inotlH'rs 
and  their  offspring,  so  as  to  resolve  the  difficulties  inher¬ 
ent  to  the  altruistic  behavior  of  the  signaller  when  pjussing 
relevant  information  to  the  receiver  (Fitch  ,  2004). 

The  paradox  of  the  evolution  of  communication  in  a  pan- 
mitic  population  is,  in  fact,  an  older  idea:  the  very  notion 
that  intraspecific  cooperation  might  lead  to  an  inverse  den¬ 
sity  dependence  on  the  growth  rate  of  some  social  animals  is 
the  essence  of  the  Allec  effect  (Alice  ,  1931)  (sec  Courchamp 
et  al.  (1999)  for  a  review).  Figure  7  illustrates  the.  Alice 
effect,  i.e.,  the  inverse  density  dependence  of  the  per  capita 
growth  rate  in  the  case  of  random  encounters  (in  =  0)  and 
the  usual  density  dependence  in  the  case  of  strong  assor- 
tative  meetings  (m  =  0.9).  Extinction  is  certain  whenever 
the  population  of  individuals  who  have  strategy  1  reaches 
a  frequency  value  for  which  the  growth  rate  is  negative. 

In  this  contribution,  we  have  expanded  the  work  of 


Cavalli-Sforza  and  Feldman  (1983)  by  showing  that  the 
emergence  of  different  communication  codes,  even  when 
clearly  advantageous  in  comparison  with  the  code  adopted 
by  the  resident  population,  is  likely  to  be  established  only 
if  some  aggregation  (or  segregation)  mechanism  is  acting 
on  the  population.  There  is  vast  evidence  of  this  process 
in  the  linguistic  literature,  the  more  recent  is  probably  the 
development  of  the  Black  English  Vernacular  dialect  in 
black  ghettos  in  America  (Pinker  ,  1994). 

Acknowledgments 

This  work  was  supported  in  part  by  the  Air  Force  Of¬ 
fice  of  Scientific  Research,  Air  Force  Material  Command, 
USAF.  under  grant  number  FA9550-06- 1-0202,  and  in  part 
by  CNPq  and  FAPESP,  Project  No.  04/06156-3.  The  U. 
S.  Government  is  authorized  to  reproduce  and  distribute 
reprints  for  Governmental  purpose  notwithstanding  any 
copyright  notation  thereon. 

References 

Abbott.  B.,  2000.  Fodor  and  Lepore  on  Meaning  Similar¬ 
ity  and  Compositionality.  The  Journal  of  Philosophy  97, 
454-455. 

Alice.  W.  C.,  1931.  Animal  Aggregations.  A  Study  in  Gen¬ 
eral  Sociology.  University  of  Chicago  Press,  Chicago. 
Brighton,  H.,  Smith,  K..  Kirby,  S.,  2005.  Language  as  an 
evolutionary  system.  Phys.  Life  Rev.  2,  177-226. 
Cangelosi,  A.,  2001.  Evolution  of  Communication  and  Lan¬ 
guage  Using  Signals,  Symbols,  and  Words.  IEEE  Trans. 
Evol.  Comput.  5,  93-101. 

Cavalli-Sforza,  L.  L.,  Feldman,  M.  W.,  1983.  Paradox  of  the 
evolution  of  communication  and  of  social  interactivity. 
Proc.  Natl.  Acad.  Sci.  USA  80,  2017-2021. 

Chomsky,  N..  1972.  Language  and  mind.  Hareourt  Brace 
Jovanovich,  New  York. 

Chiirehland.  P.M.,  1998.  Conceptual  Similarity  across  Sen- 
.sory  and  Neural  Diversity:  The  Fodor/Lepore  Challenge 
Answered.  The  Journal  of  Philosophy  95,  5-32. 
Courchamp,  F..  Clutton-Brock,  T.,  Grenfell,  B.,  1999.  In¬ 
verse  density  dependence  and  the  Alice  effect.  Trends 
Eeol.  Evol.  14,  405-410. 

Dawkins,  R.,  Krebs,  J.  R.,  1978.  Animal  signals:  informa¬ 
tion  or  manipulation?  in:  Krebs,  J.  R.,  Davies,  N.  B. 
(Eds.),  Behavioural  ecology:  an  evolutionary  approach. 
Blackwel  Scientific  Publications.  Oxford,  UK,  pp.  282- 
309. 

Deacon,  T.  W..  1997.  The  Symbolic  Species.  W.W.  Norton 
k  Company,  New  York. 

Dunbar,  R.,  1996.  Grooming,  Gossip,  and  the  Evolution  of 
Language.  Harvard  University  Press,  Cambridge,  MA. 
Eshel.  I..  Cavalli-Sforza,  L.  L.,  1982.  Assortement  of  en¬ 
counters  and  evolution  of  cooperativeness.  Proc.  Natl. 
Acad.  Sci.  USA  79,  1331-1335. 


Eweiis,  W.  J.,  2004.  Mathematical  Population  Genetics, 
2nd  edition.  Springer,  New  York. 

Fitch,  W.  T.,  2004.  Kin  selection  and  mother  tongues:  A 
neglected  component  in  language  evolution,  in:  Oiler,  K., 
Griebel  U.  (Eds.),  Evolution  of  Communication  Systems: 
A  Comparative  Approach,  MIT  Press,  Cambridge,  MA, 
pp.  275-296. 

Fodor,  J.,  1983.  The  Modularity  of  Mind.  MIT  Press,  Cam¬ 
bridge,  MA. 

Fodor,  J.,  Lepore,  E.,  1999.  All  at  Sea  in  Semantic  S])ace: 
Churchland  on  Meaning  Similarity.  The  Journal  of  Phi¬ 
losophy  96,  381-403. 

Fontanari,  J.F.,  Perlovsky,  L.L,  2007.  Evolving  composi¬ 
tionality  in  evolutionary  language  games.  IEEE  Trans. 
Evol.  Comput.,  doi:10.1109/TEVC.2007.892763 

Fudenberg,  D.,  Tirole,  J.,  1991.  Game  Tlux)ry.  MIT  Pre.ss, 
Cambridge,  MA. 

Gordon.  P.,  2004.  Numerical  cognition  without  words:  Ev¬ 
idence  from  Amazonia.  Science  306,  496-499. 

Hauser,  M.  D.,  1996.  The  Evolution  of  Communication. 
MIT  Press,  Cambridge,  MA. 

Hurford,  J.R.,  1989.  Biological  evolution  of  the  Saussurean 
sign  as  a  component  of  the  language  acquisition  device. 
Lingua  77,  187-222 

MacLennan,  B.J,,  1991  Synthetic  ethology:  an  approach  to 
the  study  of  communication,  in:  Langton,  C.G.,  Taylor, 
C.,  Doyne  Farmer,  J.,  Rasmus.sen,  S.  (Eds.),  Artificial 
Life  II,  SFI  Studies  in  the  Sciences  of  Complexity,  vol. 
X.  Addison- Wesley,  Redwood  City,  pp.  631-658. 

Maynard  Smith,  J.,  1982.  Evolution  and  the  Theory  of 
Games.  Cambridge  University  Press.  Cambridge,  UK. 

Mitchell.  M.,  1996.  An  Introduetion  to  Genetic  Algorithms. 
MIT  Press,  Cambridge,  MA. 

Mufwene,  S.  S.,  2001.  The  Ecology'  of  Language  Evolution. 
Cambridge  University  Press,  Cambridge,  UK. 

Noble,  J.,  2000.  Cooperation,  competition  and  the  evo¬ 
lution  of  prelinguistic  communieation,  in:  Knight,  C.. 
Studdert-Kennedy,  M.,  Hurford,  J.  (Eds.),  The  Evolu¬ 
tionary  Emergence  of  Language,  Cambridge  University 
Press,  Cambridge,  UK,  pp.  40-61. 

Nowak,  M.A.,  Krakauer,  D.C.,  1999.  The  evolution  of  lan¬ 
guage.  Proe.  Natl.  Acad.  Sei.  USA  96,  8028-8033. 

Nowak,  M.  A.,  Plotkin,  J.  B  ,  Krakauer,  D.  C.,  1999  The 
Evolutionary  Language  Game.  J.  theor.  Biol.  200,  147- 
162. 

Nowak,  M.  A.,  Komarova,  N.  L.,  Niyogi,  P.,  2002.  Com¬ 
putational  and  evolutionary  aspects  of  language.  Nature 
417,  611-617. 

Oliphant,  M.,  1996.  The  dilemma  of  Saussurean  communi¬ 
cation.  BioSystems  37,  31-38. 

Petitto,  L.A.,  1994.  Language  in  the  prelinguistic  child,  in: 
Bloom,  P.  (Ed.),  Language  acquisition:  Core  readings. 
MIT/Bradford  Press,  Cambridge. 

Pinker.  S.,  1994.  The  Language  Instinct.  Penguin  Press, 
London. 

Pinker.  S.,  Bloom,  P.,  1990.  Natural  language  and  natural 
selection.  Behav.  Brain  Sci.  13,  707-784. 


8 


Radick,  G.,  2002.  Darwin  on  Language  and  Selection.  Se¬ 
lection  3.  7-lG. 

do  Saiissure,  F..  1966.  Course  in  General  Linguistics.  Trans¬ 
lated  by  Wade  Baskin.  McGraw-Hill  Book  Company, 
New  York. 

Seyfarth,  R.M..  Cheney,  D.L.,  Marler,  P.,  1980.  Monkey  re¬ 
sponses  to  three  different  alarm  calls:  Evidence  of  preda¬ 
tor  classification  and  semantic  classification.  Science  210, 
801-803. 

Smith,  K.,  Kirby,  S.,  Brighton,  H.,  2003.  Iterated  Learning: 
a  framework  for  the  emergence  of  language.  Artificial  Life 
9.  371-386.. 

Wright,  S.,  1921.  Systems  of  mating.  III.  Assortativc  mat¬ 
ing  based  on  somatic  resemblance.  Genetics  6,  144-161. 

Zahavi,  A.,  1975.  Mate  selection:  A  selection  for  a  handicap. 
J.  Thcor.  Biol.  53,  205-214. 

Zahavi,  A.,  1993.  The  fallacy  of  conventional  signalling. 
Proc.  Roy.  Soc.  London  B  340,  227-230. 

Ziiidema,  W.,  2003.  Optimal  communication  in  a  noisy  and 
heterogeneous  environment.  Lecture  Notes  in  Artificial 
Intelligence  2801,  553-563. 


9 


How  communication  can  improve  differentiation  in  the 
Modeling  Field  Theory  framework 

Jose  F.  Fontanari 

Universidade  de  SSo  Paulo,  SSo  Carlos,  Brazil,  fontanari@ifsc.usp.br 
Leonid  1.  Perlovsky 

Harvard  University,  Cambridge  MA  and  The  Air  Force  Research  Laboratory,  SN,  Hanscom,  MA 

Leonid.  Perlovsky@hanscom  .af  m  il 


Abstract  —  We  propose  a  discrimination  task  scenario  to 
study  language  acquisition  in  which  an  agent  receives 
linguistic  input  from  an  external  teacher,  in  addition  to 
the  sensory  stimuli  from  the  objects  that  make  up  the 
environment.  The  agent  is  endowed  with  the  Modeling 
Field  Theory  (MFT)  categorization  mechanism,  which 
enables  it  to  identify  a  few  objects  (or  categories) 
composed  of  hundreds  of  random  pixels  (instances).  We 
show  that  the  agent  with  language  is  capable  of 
differentiating  objects  or  categories  that  it  could  not 
distinguish  without  language. 


1.  Introduction 

The  nature  of  the  selective  pressures  accountable  for  the 
emergence  of  language  has  been  object  of  passionate 
debates  since  the  viewpoint  that  language  evolved  through 
natural  selection  has  become  dominant  in  the  scientific 
community  [I].  In  this  contribution  we  examine  the 
suggestion  that  the  selective  pressures  for  language  have 
come  from  the  brute  exigencies  of  survival,  e.g.,  hunting, 
food  gathering  and  predator  detection  (see  [2],  [3]).  We 
refer  the  reader  to  Ref  [4]  for  the  alternative  and  perhaps 
more  popular  stance  that  the  leading  role  in  language 
evolution  was  played  instead  by  the  demands  of  the  social 
life  of  early  hominids.  Rather  than  focusing  on  the 
‘‘origin”  issue,  here  we  take  a  more  pragmatic  view  and 
consider  these  elementary  survival  needs  as  problems  to 
be  solved  by  the  individuals  (agents,  in  our  case),  and  ask 
whether  and  how  communication  can  improve  their 
performances  to  solve  a  specific  problem  relevant  to  the 
individuals’  endurance. 

The  specific  task  we  consider  in  this  contribution  is  the 
differentiation  problem,  i.e.,  how  agents  develop  a  more 
detailed  knowledge  of  their  surroundings.  In  fact,  one 
possible  advantage  of  communication  is  that  a  group  of 
individuals  with  this  capacity  can  perceive  their 
environment  beyond  the  limits  of  their  senses:  an 
individual  unable  to  communicate  can  access  its 


environment  based  on  the  information  provided  by  its 
own  senses  only  [5],  [6].  Here  we  use  the  term 
differentiation  as  synonymous  to  discrimination.  To  be 
able  to  discriminate  is  to  be  able  to  judge  whether  two 
inputs  are  the  same  or  different  [7].  The  ability  to 
discriminate  inputs  depends  on  the  constitution  of  iconic 
representations:  same/different  judgments  are  based  on 
the  sameness  or  difference  of  these  representations, 
according  to  some  inherent  similarity  measure. 
Discrimination  is  clearly  independent  of  identification  as 
one  can  discriminate  things  without  knowing  what  they 
are  [7].  For  identification,  icons  must  be  reduced  to  a 
small  set  of  invariant  features  that  will  reliably  distinguish 
members  of  different  categories.  Recently  we  have  shown 
how  a  novel  adaptive  approach  to  concept  formation  - 
Modeling  Field  Theory  (MFT)  [8]  -  can  successfully 
integrate  and  implement  these  two  tasks  into  a  simple 
autonomous  neural-networks-like  scheme  [9],  [10].  Here 
we  advance  further  and  allow  the  agents  endowed  with  a 
categorization  system  based  on  MFT  to  create  symbolic 
or  linguistic  representations  for  members  of  a  category, 
i.e.,  to  name  the  category.  This  is  a  modest  first  step 
towards  the  ambitious  program  of  fully  integrating 
language  and  cognition  [  1 1  ],  [  1 2]. 

In  the  next  section,  we  describe  the  environment  in  which 
the  agent  is  embodied  and  embedded  -  the  Umwelt  in  the 
ethologists’  jargon  -  as  well  as  the  task  posed  to  it.  In 
section  3  we  briefly  review  the  MFT  formalism  within  the 
context  of  the  specific  categorization  problem  addressed 
in  this  paper.  In  section  4  we  present  the  results  of  the 
simulation  of  the  MFT  dynamic  in  the  case  the  agent  does 
not  receive  a  linguistic  input,  and  in  section  5  wc  address 
the  more  interesting  case  where  an  external  teacher 
names  the  objects  as  they  are  perceived  by  the  agent. 
Finally,  section  6  summarizes  the  main  conclusions. 


2.  The  differentiation  task 

We  assume  that  the  world  contains  a  certain  number  of 
objects  whose  features  (e.g.,  color,  smell,  coordinates  in  a 
grid,  texture,  etc.)  are  modeled  by  overlapping  sets  of 
points  drawn  from  Gaussian  distributions.  Figure  I 
displays  the  particular  instance  wc  will  consider  in  this 
paper.  There  are  at  least  two  (equivalent)  interpretations 
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for  the  problem  we  are  about  to  tackle.  First,  we  can  view 
each  point  in  the  figure  as  representing  two  particular 
features  of  600  objects  which  belong  to  six  distinct 
categories.  The  issue  is  then  to  identify  these  categories. 
Note  that  identification  presumes  prior  discrimination. 
Second,  the  600  points  displayed  in  Fig,  1  represent  pixels 
of  the  image  of  six  solid  objects  projected  into  a  two- 
dimensional  retina.  The  issue  is  then  to  identify  the 
objects,  a  classic  pattern  recognition  problem  [13].  MFT 
has  been  applied  to  this  type  of  problem  with  success  for 
many  years  [8]  (see  [10]  for  use  of  MFT  together  with  the 
Akaike  Information  Criterion  [14]  to  estimate  the  number 
of  objects  in  a  scene).  Here  we  adhere  to  this  object- 
oriented  interpretation.  There  are,  however,  a  few  issues 
we  should  mull  over  before  embarking  on  the 
mathematical  formulation  of  the  agent-based  model. 

When  talking  about  autonomous  differentiation  of  objects 
we  are  implicitly  assuming  that  the  system  knows 
somehow  what  an  object  is.  This  is  a  very  difficult 
question  as  Marr’s  skeptical  remark  readily  reminds  us: 
“...  all  these  things  can  be  an  object  if  you  want  to  think 
of  them  that  way,  or  they  can  be  part  of  a  larger  object” 
[15].  The  notion  of  object,  however,  is  central  to  the 
understanding  of  how  children  acquire  language.  In  that 
case,  the  problem  seems  to  be  solved  by  inborn 
mechanisms  that  implement  the  so-called  principle  of 
cohesion:  an  object  is  a  connected  and  bounded  region  of 
matter  that  maintains  its  connectedness  and  boundaries 
when  it  is  in  motion  [16].  It  would  be  interesting  to  apply 
MFT  to  the  categorization  of  sets  of  pixels  in  movement, 
since  tracking  of  moving  objects  is  one  of  the  traditional 
applications  of  that  method  [16].  The  bottom  of  the 
problem  is  that  the  notion  of  object  must  be  explicitly 
built  into  the  categorization  scheme.  In  the  MFT 
framework,  this  is  done  when  we  define  a  priori  the  range 
of  concept  models  that  are  used  to  categorize  the  input 
data. 

The  environment  is  set  so  that  an  agent  cannot  categorize 
and  identify  all  objects,  because  of  the  considerable 
overlap  between  them  (see  Figure  1).  Inspired  by  the 
‘‘mushroom”  world  scenario  [5],  [6],  we  allow  the  agent 
to  receive  from  the  environment  an  additional  sensory 
input:  a  heard  linguistic  signal.  Here,  we  assume  that  this 
signal  is  produced  by  an  external  teacher  who  has  perfect 
knowledge  of  the  agent’s  environment.  In  the  following 
we  will  show  that  when  the  agent  receives  the  linguistic 
input  associated  to  the  different  objects,  it  can  create  new 
concept-models  and  so  identify  unambiguously  all 
objects. 
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Figure  1:  The  six  sets  of  100  pixels  represent  the 
coordinates  x^and  y  of  six  objects  or,  alternatively, 
features  A  and  B  of  six  categories.  The  coordinates  of 
each  pixel  are  drawn  from  Gaussian  distributions  of 
means  0.1,  0.2,  0.29,  0.3,  0.4,  and  0.5  and  standard 
deviation  0.01. 


3.  Modeling  field  theory 

TTie  basic  idea  behind  Modeling  Field  Theory  is  the 
association  between  lower-level  signals  (e.g.,  inputs)  and 
higher-level  concept-models  (internal  representations) 
avoiding  the  combinatorial  complexity  inherent  to  such  a 
task.  TTiis  is  achieved  by  using  measures  of  similarity 
between  concept-models  and  input  signals  together  with  a 
new  type  of  logic,  so-called  fuzzy  dynamic  logic.  We 
refer  the  reader  to  Perlovsky’s  book  [8]  for  a  complete 
presentation  of  MFT;  here  we  particularize  the  general 
framework  to  the  problem  of  categorizing  the  N  =  600 
pixels  depicted  in  Figure  1.  Each  pixel  is  described  by  the 
pair  of  real  variables  (Oj,- ,  O21 )  ^vith  /  =  1, . . . ,  A  .  Let  us 
assume  that  there  are  M  concept-models  described  by  the 
pairs  (5j^  **^2*  )  with  ^  =  1,  -, A/ that  should  represent 
the  original  pixels.  We  define  arbitrarily  the  following 
partial  similarity  measure  between  object  /  and  concept  k 

/('■  I  ^)  =  f[  ^  ‘  exp[-  (0„.  -  S,,* ) V2cr;*  ]  ( I ) 

e^l 

where,  at  this  stage,  the  fuzziness  arc  parameters 
given  a  priori.  We  refer  the  reader  to  Ref  [17]  for  another 
application  of  MFT  in  the  case  of  multi-component 
inputs.  TTie  goal  is  to  find  an  assignment  between  models 
and  objects  such  that  the  global  similarity 


(2) 

k 

is  maximized.  A  fundamental  role  is  played  by  the  fuzzy 
association  variables  f{k  \  i)  defined  by 

fik\i)  =  Ki\k)/'£Ki\k‘)  (3) 
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which  give  a  measure  of  the  correspondence  between 
object  /  and  concept  k  relative  to  all  other  concepts/:'. 
The  maximization  of  the  global  similarity  L  can  be 
achieved  using  the  MFT  mechanism  of  concept  formation 
which  is  based  on  the  following  dynamics  for  the 
modeling  fields 

ldt  =  '^f{k\  /■)[a  log  /(/ 1  k)ldS,k  ]  (4) 

/ 

for  e-\,2  and  /:  =  !,•••, M  .  We  note  that  although  the 
coordinates  x  and  of  a  pixel  are  independent  random 
variables,  the  two  components  of  a  modeling  field  are 
coupled  dynamic  variables.  Actually,  the  term  f{k  \  i)  in 
Eq.  (4)  couples  not  only  5^  and  5^*  but  also 
components  of  different  modeling  fields  [17]. 

Proper  adjustment  of  the  fuzziness  during  the 

evolution  of  the  modeling  fields  allows  the  dynamics  to 
converge  to  the  maximum  of  the  global  similarity  L  In 
particular,  we  decrease  according  to  the  following 
prescription 

U)  =  cri  exp{- a,t)+  (5) 

With  a^=5xI0"^,  (7/^  =0.03,  and  =0.5  for 
e  =  l,2.  Note  that  these  parameters  are  the  same  for  all 
models  /:  =  l,...,Af  and  components  e  =  l,2.  However, 
we  will  need  different  parameter  values  to  stabilize  the 
linguistic  component  e  =  3,  which  we  will  introduce  in 
the  Section  5. 

In  what  follows  we  will  set  /V/  =6  so  Eq.  (4)  stands  for  a 
set  of  twelve  nonlinear  coupled  equations,  which  are 
solved  with  Euler’s  method  using  the  step-size  /?  =  10"^ . 
As  mentioned  before,  use  of  the  MFT  approach  in 
conjunction  with  the  Akaike  Information  Criteron  has 
allowed  us  to  design  a  categorization  system  that  infers 
correctly  the  true  number  of  objects  in  an  environment 
similar  to  that  exhibited  in  Figure  1  [10],  but  for  the 
purposes  of  this  paper,  any  choice  of  M  >6  will  be 
satisfactory. 

4.  AGENT  WITHOUT  LANGUAGE 

The  problem  is  motivated  by  the  inability  of  an  agent  to 
distinguish  between  the  six  objects  that  make  up  its 
environment.  The  difficulty,  of  course,  is  to  distinguish 
between  the  two  set  of  pixels  centered  at  the  coordinates 
(0.29,0.29)  and  (0. 3,0.3)  as  shown  in  Figure  1.  Figures  2 
and  3  illustrates  the  time  dependence  of  the  two 
components,  e  =  1  (Feature  A)  and  e  =  2  (Feature  B)  of 
the  modeling  fields  5^^ .  Since  Feature  B  is  essentially 
equivalent  to  Feature  A  the  associated  modeling  field 
components  exhibit  a  very  similar  behavior  pattern.  What 
distinguish  these  components  are  simply  their  initial 
values,  which  were  chosen  randomly  from  the  uniform 


distribution.  The  point  of  Figures  2  and  3  is  to  stress  the 
rather  expected  failure  of  most  categorization  methods  to 
distinguish  highly  overlapping  objects.  The  agent  is  able 
to  identify  four  of  the  six  objects  displayed  in  Figure  I 
and,  in  addition,  it  can  clearly  discriminate  the  two 
overlapping  objects  from  the  other  four. 
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Figure  2  -  Evolution  of  the  component  e  =  1  (feature  A) 
of  the  six  modeling  fields  when  the  agent  without 
language  perceives  the  environment  composed  of  the  six 
“objects”  illustrated  in  Figure  1.  Notice  that  the  agent  is 
unable  to  distinguish  between  the  two  pixel  blobs  located 


at  (0.29,0.29)  and  (0.3,0.3). 
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Figure  3  -Same  as  Figure  2  but  for  component  e  =  2 
(feature  B)  of  the  modeling  fields.  So,  even  considering 
both  features  A  and  B  the  agent  without  language  cannot 
identify  all  objets. 

5.  Agent  with  language 

Following  Refs.  [5],  [6],  we  assume  that  besides  the 
physical  stimuli  (Oj^-,02,-)  with  f  =  I,...,iV  the  agent 
receives  from  the  environment  an  additional  “linguistic” 
input  Wi  associated  to  each  of  the  .V  pixels.  In  practice, 
this  amounts  to  assume  the  existence  of  an  external 
teacher  who,  while  pointing  to  a  pixel  (Oj^  O2/ ) ,  utters 
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the  word  fi] .  Of  course,  the  teacher  utters  the  same  word 
for  all  pixels  that  make  up  a  single  object  or  for  instances 
of  a  same  category.  Hence,  fV^ ,  /  =  1,...,A^,  takes  on 
only  six  different  values  (i.e.,  there  are  only  six  different 
words).  The  nature  of  the  signals  fVj  (i.e.,  the  words)  is 
completely  distinct  from  that  of  the  inputs  (Oj, » ) .  To 
take  this  into  account  we  assume  that  the  words  FT.  take 
on  the  integer  values  1,2, . . .  ,6  . 

From  the  mathematical  aspect,  inclusion  of  the  additional, 
linguistic  component  to  characterize  the  pixels  does  not 
alter  in  any  essential  way  the  basic  equations  of  the  field 
dynamics,  Eqs.  (4)  and  (5).  In  particular,  the  inputs  are 
now  described  by  the  triples  (C^i, ^  = 
which  should  be  matched  by  the  three-component 
modeling  fields  ,^3^^ )  ^  =  1,--,A/.  Hence,  the 

form  of  the  field  equations  is  unaltered,  and  the  addition 
of  a  third  component  is  considered  by  letting  the  index  e 
run  from  1  to  3.  The  parameters  for  the  linguistic 
component  e  =  3  are  or3=lxl0^,  cr^3  =3  and 
cr^3  =0.1.  The  reason  for  the  larger  value  of  (7^3 ,  as 
compared  with  the  values  of  cr^j  and  a^2 »  that  the 
separation  between  the  target  words  are  greater  than  the 
distance  between  the  mean  values  of  the  gaussian 
distributions  used  to  generate  the  pixels  of  Figure  1.  We 
note  that  for  the  successful  convergence  of  the  MFT 
scheme  one  should  always  start  with  large  fuzziness  to 
guarantee  that  at  the  outset  any  one  model  has  a  nonzero 
similarity  with  all  input  data  [9].  Moreover,  the  choice 
of  a  smaller  value  of  Ct^  emphasizes  the  need  for 
different  learning  times  for  assimilation  of  inputs  of 
distinct  nature.  Here  we  let  the  linguistic  component 
evolve  much  slower  than  the  non-linguistic  ones. 
.Because  of  this  slower  rate,  the  time  scale  for 
convergence  of  the  dynamics  is  increased  as  seen  in  the 
next  four  figures. 


In  Figures  4  to  7  we  show  the  time  evolution  of  the  three 
components  of  the  modeling  fields  in  the  case  the 
linguistic  input  is  considered. 
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Figure  5  -  Evolution  of  the  component  e  =  1  (feature  A) 
of  the  six  modeling  fields  for  the  agent  with  language. 
Though  barely  visible  in  this  scale  the  agent  identifies 
now  six  distinct  objects  or  categories  (see  Figure  6). 


0  34  , - - - - - - - -  — - - - - 

0.32  I  ^ 

0.3  1 

0.28  ^  I 

0.26  ^  1 

0.24  1  ^  '  i 

°'^^0  200  400  600  800’" lO^o"  1200*  1400*  1*600*18001000 

ti  me/50 


Figure  6  -  A  closer  view  of  the  modeling  fields 
associated  to  the  two  overlapping  objects  displayed  in 
Figure  5  confirms  that  the  agent  indeed  discriminates 
Feature  A. 
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Figure  4  -  Evolution  of  the  linguistic  component  (e  =  3  ) 
of  the  modeling  fields  whose  initial  values  were  chosen 
randomly  among  the  integers  1,2, . . .  ,6  . 
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Figure  7  ~  Evolution  of  the  component  e  =  2  (feature  B) 
of  the  six  modeling  fields  for  the  agent  with  language. 
The  distinction  between  the  two  overlapping  objects  is 
more  perceptibe  for  this  component. 


The  one-to-one  correspondence  between  the  input-words 
and  the  component  ^  =  3  of  the  modeling  fields  is  easily 
achieved  as  shown  in  Figure  4.  As  soon  as  the  agent 
assimilates  the  fact  that  words  3  and  4  are  different,  which 
happens  at  /  =  50x600  approximately,  the  two 
overlapping  objects  are  differentiated,  as  illustrated  in 
Figures  5,  6,  and  7.  This  is  a  remarkable  finding:  the 
extra  information  carried  by  the  linguistic  component 
allowed  the  agents  to  create  distinct  non-linguistic 
representations  for  the  objects.  We  note  that  the 
asymptotic  values  of  the  modeling  fields  illustrated  in 
these  figures  do  not  match  exactly  the  means  of  the 
Gaussian  distributions  used  to  generate  the  pixels  of 
Figure  1,  but  they  are  close  enough  to  them  to  identify 
unambiguously  the  six  objects  or  categories. 


6.  Conclusions 


We  have  reported  a  computational  experiment  in  which 
the  addition  of  language,  or  more  precisely  of  a  linguistic 
signal,  affects  the  manner  that  an  agent  processes  its  other 
sensory  inputs.  Remarkably,  the  agent  with  language  is 
capable  of  differentiating  objects  or  categories  that  it 
could  not  distinguish  without  language.  We  note  that  what 
distinguishes  linguistic  signals  (e.g.,  word  sounds)  from 
other  stimuli  is  that  the  agent  experiences  the  sounds  in 
concomitance  with  non-linguistic  experience.  The  crucial 
role  played  by  the  linguistic  signal  in  our  experiment 
contrasts  with  the  more  mildly  claim  that  language 
enhances  performance  only  if  the  agent  has  already 
,  evolved  an  ability  to  respond  appropriately  to  the  visually 

^  perceived  objects  without  language  [5]. 

j  In  our  scenario,  the  agent  develops  only  the  capacity  to 

“understand”  the  words  uttered  by  the  external  teacher; 
the  production  of  words  was  not  considered  as  it  must 
necessarily  involve  at  least  two  agents  (see  below).  The 
agent  “understands”  the  meaning  of  a  word  when  it 
associates  that  word  stimulus  with  a  concrete  object  in  the 
environment.  This  type  of  association  is  made  very  simple 
in  the  MFT  framework.  In  that  sense,  the  experiment 
reported  here  is  relevant  to  the  issue  of  language 
acquisition  by  children  [16]. 

As  pointed  above,  the  study  of  the  emergence  of  the 
ability  to  produce  linguistic  signals  requires  the  use  of  two 
.  or  more  agents.  The  main  difficulty  to  adapt  our 

discrimination  task  scenario  to  the  multi-agent  situation, 
and  so  replace  the  external  teacher  by  the  agents 
themselves,  is  that  we  need  to  assume  that  one  agent  (the 
speaker)  can  somehow  distinguish  between  the 
\  overlapping  objects  while  the  other  agent  (the  hearer) 

I  cannot.  This  type  of  unwarranted  assumption  is  made  in 

‘  the  mushroom  world  scenario  [5],  [6].  A  less  far-fetched 

‘  possibility  is  to  assume  that  the  agents  can  perceive 

different  features  of  the  objects.  So  it  is  plausible  to  admit 
^  that  what  is  seen  as  a  single  object  by  one  agent  is 

perceived  as  two  or  more  objects  by  another  agent,  since 


they  process  different  features  of  their  environment. 
Work  in  this  direction  is  on  the  way. 
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Language  acquisition  and  category  discrimination  in  the  Modeling 

Field  Theory  framework 

Jose  F.  Fontanari,  and  Leonid  1.  Perlovsky 


Abstract — We  propose  a  categorization  task  scenario  to 
study  language  acquisition  in  which  an  agent  receives  linguistic 
input  from  an  external  teacher,  in  addition  to  sensory  stimuli 
from  the  objects  that  make  up  the  environment.  The  agent  is 
endowed  with  the  Modeling  Field  Theory  (MFT)  categorization 
mechanism,  which  enables  it  to  identify  overlapping  categories 
from  the  exposition  to  hundreds  of  examples.  Rather 
remarkably,  we  find  that  the  agent  with  language  is  capable  of 
differentiating  object  features  that  it  could  not  distinguish 
without  language.  In  this  sense,  the  linguistic  stimuli  prompt 
the  agent  to  redefine  and  refine  the  discrimination  capacity  of 
its  sensory  channels. 


I.  Introduction 

A  major  challenge  to  the  paladins  of  the  viewpoint  that 
language  has  evolved  through  natural  selection  as  any 
other  biological  organ  [1]  is  to  identify  the  nature  of 
the  selective  pressures  accountable  for  the  emergence  of 
language  -  a  capability  that  singles  out  the  human  species 
from  the  other  animals  in  the  planet.  In  this  contribution  we 
examine  the  suggestion  that  such  selective  pressures  have 
come  from  the  exigencies  of  survival,  e.g.,  hunting,  food 
gathering  and  predator  detection  (see  [2],  [3]).  We  refer  the 
reader  to  Ref  [4]  for  the  alternative  stance  that  the  leading 
role  in  language  evolution  was  played  instead  by  the 
demands  of  the  social  life  of  early  hominids.  However, 
rather  than  focusing  on  the  evolution  issue,  here  we  pursue  a 
more  pragmatic  approach  and  consider  these  elementary 
survival  needs  as  problems  to  be  solved  by  the  individuals, 
and  ask  whether  and  how  communication  can  improve  their 
performance  to  solve  categorization  problems. 

The  specific  task  we  consider  in  this  contribution  is  the 
differentiation  problem,  i.e.,  how  individuals  (agents) 
develop  a  more  detailed  knowledge  of  their  surroundings.  In 
fact,  as  pointed  out  by  Parisi  and  Cangelosi  [5]  (see  also  [6]) 
one  possible  advantage  of  communication  is  that  a  group  of 
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individuals  with  this  capacity  can  perceive  their  environment 
beyond  the  limits  of  their  senses:  an  individual  unable  to 
communicate  can  access  its  environment  based  on  the 
information  provided  by  its  own  senses  only.  Here  we  use 
the  term  differentiation  as  synonymous  to  discrimination.  To 
be  able  to  discriminate  is  to  be  able  to  judge  whether  two 
inputs  are  the  same  or  different.  According  to  Hamard  [7], 
discrimination  is  independent  of  identification  as  one  can 
discriminate  things  without  knowing  what  they  are.  For 
identification,  the  iconic  representations  of  the  raw'  input 
data  must  be  reduced  to  a  small  set  of  invariant  features  that 
will  reliably  distinguish  members  of  different  categories. 
Recently  we  have  shown  how  a  novel  adaptive  approach  to 
concept  formation  -  Modeling  Field  Theory  (MFT)  [8]  - 
can  successfully  integrate  and  implement  these  two  tasks 
into  a  simple  autonomous  neural-networks-like  scheme  [9], 
[10].  Here  we  advance  further  and  allow  the  agents  endowed 
with  a  categorization  system  based  on  MFT  to  learn  from  an 
external  teacher  symbolic  or  linguistic  representations  for 
members  of  a  category,  i.e.,  to  name  the  category.  This  is  a 
modest  first  step  towards  the  ambitious  program  of  fully 
integrating  language  and  cognition  [1 1],  [12]. 

In  the  next  section,  we  describe  the  environment  in  which 
the  agent  lives  as  well  as  the  task  posed  to  it.  In  section  3 
we  briefly  review  the  MFT  formalism  within  the  context  of 
the  specific  categorization  problem  addressed  in  this  paper. 
In  section  4  we  present  the  results  of  the  simulation  of  the 
MFT  dynamics  in  the  case  the  agent  does  not  receive  a 
linguistic  input,  and  in  section  5  we  address  the  more 
interesting  case  where  an  external  teacher  names  the  objects 
as  they  are  perceived  by  the  agent.  Finally,  section  6 
summarizes  the  main  conclusions. 


II.  The  CATEGORIZATION  TASK 

We  assume  that  the  world  contains  a  certain  number  of 
categories  whose  examples  are  modeled  by  overlapping  sets 
of  points  drawn  from  Gaussian  distributions.  Figure  I 
displays  the  particular  instance  we  will  consider  in  this 
paper.  We  can  view  each  point  in  the  figure  as  representing 
two  particular  features,  feature  A  and  feature  B,  of  600 
objects  (examples)  which  belong  to  six  distinct  categories. 
The  issue  is  then  to  identify  these  categories.  Note  that 
identification  presumes  prior  discrimination.  The 
environment  is  set  so  that  the  agent  cannot  categorize  and 
identify  all  examples,  because  of  the  considerable  overlap 
between  their  features  (see  Fig.  1).  Inspired  by  the 


“mushroom”  world  scenario  [5],  [6],  we  allow  the  agent  to 
receive  from  the  environment  an  additional  sensory  input:  a 
heard  linguistic  signal.  Here,  we  assume  that  this  signal  is 
produced  by  an  external  teacher  who  has  perfect  knowledge 
of  the  agent’s  environment.  In  the  following  we  will  show 
that  when  the  agent  receives  the  linguistic  input  associated 
to  the  different  objects,  it  can  create  new  concept-models 
and  so  identify  unambiguously  all  objects. 
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Fig.  1 .  The  six  sets  of  100  examples,  represented  by  the  features  A  and  B 
(e.g.,  texture  and  color),  of  six  categories.  The  coordinates  of  each  pixel 
are  drawn  from  Gaussian  distributions  of  means  0.1,  0.2,  0.29,  0.3,  0.4,  and 
0.5  and  standard  deviation  0.01.  The  labels  1,  2,  ....  6  arc  the  words 
(names)  associated  to  each  example  of  the  six  categories. 
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where,  at  this  stage,  the  fuzziness  are  parameters  given 
a  priori.  We  refer  the  reader  to  Ref  [13]  for  another 
application  of  MFT  in  the  case  of  multi-component  inputs. 
The  goal  is  to  find  an  assignment  between  models  and 
examples  such  that  the  global  similarity 

(2) 

/  k 

is  maximized.  The  maximization  of  L  can  be  achieved  using 
the  MFT  mechanism  of  concept  formation  which  is  obtained 
through  the  direct  maximization  of  (2)  with  respect  to  5^*  . 
The  aim  here  is  to  derive  a  dynamical  equation  for  the 
modeling  fields  such  that  dLjdt  ^  0  for  all  time  t.  This 

condition  can  easily  be  met  by  choosing 
dS^f^ldt  =  dLjdSgf^  since  then 

dLjdt  =  ^  {dLjdS,,  Ids,,  ldt)  =  Y,  )->Q  (3) 

e.k  e.k 

as  required.  The  calculation  of  dL/dS^,,^  is  straightforward 
dL  1  dl{i\k) 

*' 

and  leads  to  the  following  dynamics  for  the  modeling  fields 

dS„  jdt  =  X /(^  I  ')[3  log/(/ 1  k)ldS„  ] .  (5) 

/ 

for  e  =  1,2  and  ^  =  1,  •  •  • ,  A/  and  where  we  have  used  the 
identity  logy /3jc  .  The  fuzzy  association 

variables  / {k  \  /)  defined  by 


The  basic  idea  behind  Modeling  Field  Theory  is  the 
association  between  lower-level  signals  (e.g.,  inputs)  and 
higher-level  concept-models  (internal  representations) 
avoiding  the  combinatorial  complexity  inherent  to  such  a 
task.  This  is  achieved  by  using  measures  of  similarity 
between  concept-models  and  input  signals  together  with  a 
new  type  of  logic,  so-called  fuzzy  dynamic  logic.  We  refer 
the  reader  to  Ref  [8]  for  a  complete  presentation  of  MFT; 
here  we  particularize  the  general  framework  to  the  problem 
of  categorizing  the  N  =  600  examples  of  the  six  categories 
depicted  in  Fig.  1.  Each  example  is  described  by  the  pair  of 
real  variables  (Oj, ,  O2, )  with  /  =  1, . . . ,  iV  .  Let  us  assume  that 
there  are  M  concept-models  described  by  the  pairs 
(5j^ ,  52*  )  with  /:  =  1,  •  •  • ,  A/  that  should  “model”  (i.e.,  create 
iconic  representations)  the  original  examples.  Hence  the 
denomination  “modeling  fields”  to  the  mathematical 
quantities'^*  We  define  arbitrarily  the  following  partial 
similarity  measure  between  object  i  and  concept  k 

/(/ 1  ^) = n  1  ( '  ^ 


/(^|/)  =  /(/i  Y^/(/|yt')  (6) 

play  a  fundamental  role  in  the  interpretation  of  the  MFT 
dynamics  by  giving  a  measure  of  the  correspondence 
between  object  i  and  concept  k  relative  to  all  other 
concepts  k' .  We  note  that  although  the  features  A  and  B  of 
an  example  are  independent  random  variables,  the  two 
components  of  a  modeling  field  are  coupled  dynamic 
variables.  Actually,  the  term  f(k  |  /)  in  Eq.  (5)  couples  not 
only  5|*  and  ^2*  but  also  components  of  different 
modeling  fields  [13]. 

It  can  be  shown  that  the  dynamics  (5)  always  converges  to  a 
(possibly  local)  maximum  of  the  similarity  L  [8].  By 
properly  adjusting  the  fuzziness  the  global  maximum 
can  be  attained.  A  salient  feature  of  dynamic  logic  is  a  match 
between  parameter  uncertainty  and  fuzziness  of  similarity. 
In  what  follows  we  decrease  the  fuzziness  during  the  time 
evolution  of  the  modeling  fields  according  to  the  following 
prescription 


=  exp(-a,/)+a^ 


(7) 
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with  =  5  X 1 0”^  ,  =  0.03 ,  and  =0.5  for  e-\,2. 

Note  that  these  parameters  are  the  same  for  all  models 
^  =  1,...,M  and  components  ^  =  1,2.  However,  we  will 
need  different  parameter  values  to  stabilize  the  linguistic 
component  ^  =  3 ,  which  we  will  introduce  in  the  Section  5. 
As  a  guideline  for  setting  the  parameter  values  in  (7)  we 
note  that  must  be  chosen  large  enough  such  that,  at  the 
beginning,  all  examples  are  described  by  all  fields,  whereas 
the  baseline  resolution  (7^^  must  be  small  enough  such  that, 
at  the  end,  a  given  field  will  describe  a  single  category. 
However,  should  not  be  set  to  a  too  small  value  to 
avoid  numerical  instabilities  in  the  calculation  of  the  partial 
similarities  (1). 

A  word  is  in  order  about  the  connection  between  the  MFT 
framework  and  neural  networks.  A  MFT  neural  architecture 
was  described  in  [8],  which  combines  architecture  with 
models  of  objects  or  category  examples.  Essentially,  input 
neurons  or  bottom-up  signals  encode  the  feature  values  of 
the  category  examples  ,  and  top-down  or  priming  signal- 
fields  to  these  neurons  are  generated  by  the  modeling  fields 
Interaction  between  bottom-up  and  top-down  signals  is 
determined  by  the  neural  weights  f{k\i)  that  associate 
signals  and  models.  As  described  before,  these  weights  are 
functions  of  the  model  parameters  5^^,  which  in  turn  are 
dynamically  adjusted  so  as  to  maximize  the  overall 
similarity  between  category  examples  and  models.  This 
formulation  sets  MFT  apart  from  many  other  neural 
networks.  There  is,  on  the  other  hand,  a  certain  formal 
similarity  between  the  MFT  approach  and  the  Hopfield- 
Tank  neural  network  approach  to  tackle  optimization 
problems  [14].  This  becomes  apparent  when  one 
recognizes  that  the  nature  of  perceptual  problems  dealt  with 
here  is  similar  to  that  of  other  optimization  problems.  In  fact, 
in  both  systems  it  is  the  time  evolution  of  analog  neurons 
that  drives  the  neural  configuration  to  a  maximum  of  the 
cost  function  [the  global  similarity  (2)  in  our  case]. 
Moreover,  the  quality  of  the  solutions  found  by  the  neural 
network  is  greatly  improved  by  annealing  the  analog  gain 
parameter  [15],  in  a  similar  manner  as  the  slow  decrease  of 
the  fuzziness  according  to  (7)  leads  ultimately  to  perfect 
categorization.  In  addition,  the  competition  between 
different  concept-model  to  match  the  category  examples  is 
reminiscent  of  the  dynamics  of  unsupervised  learning 
algorithms  and,  in  particular,  of  self-organizing  maps  [16]. 

In  what  follows  we  will  set  A/  =  6  so  Eq.  (5)  stands  for  a  set 
of  twelve  nonlinear  coupled  equations,  which  are  solved 
with  Euler’s  method  using  the  step-size  /7  =  10“^.  In  a 
previous  contribution  [10],  we  have  combined  the  MFT 
approach  with  the  Akaike  Information  Criterion  [17]  to 
design  a  categorization  system  that  infers  correctly  the  true 
number  of  objects  in  an  environment  similar  to  that 
exhibited  in  Fig.  1,  but  for  the  purposes  of  this  paper,  any 
choice  of  A/  >  6  will  be  satisfactory. 


The  choice  of  the  particular  environment  depicted  in  Fig.  1 
is  motivated  by  the  inability  of  the  agent  to  distinguish 
between  the  six  categories  into  which  the  600  examples  are 
naturally  organized.  The  difficulty,  of  course,  is  to 
distinguish  between  the  two  sets  of  examples  centered  at  the 
coordinates  (0.29,0.29)  and  (0.3,0. 3),  which  are  labeled  3 

and  4,  respectively,  by  the  external  teacher.  Figs.  2  and  3 
illustrate  the  time  dependence  of  the  two  components, 
e  =  1  (feature  A)  and  e  =  2  (feature  B)  of  the  modeling  fields 
.  Since  feature  B  is  essentially  equivalent  to  feature  A 
the  associated  modeling  field  components  exhibit  a  very 
similar  behavior  pattern.  What  distinguish  these 
components  are  simply  their  initial  values,  which  were 
chosen  randomly  from  the  uniform  distribution.  The  point  of 
Figs.  2  and  3  is  to  stress  the  rather  expected  failure  of  most 
categorization  methods  to  distinguish  highly  overlapping 
categories.  The  agent  is  able  to  identify  four  of  the  six 
categories  displayed  in  Fig.  1  and,  in  addition,  it  can  clearly 
discriminate  the  two  overlapping  categories  from  the  other 
four. 
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Fig.  2.  Evolution  of  the  component  e  =  1  (feature  A)  of  the  six  modeling 
fields  when  the  agent  without  language  perceives  the  environment 
composed  of  600  examples  that  belong  to  six  categories  as  illustrated  in  Fig. 
1.  Note  that  the  agent  is  unable  to  distinguish  between  the  two  categories 
labeled  3  and  4  by  the  external  teacher 
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Fig.  3.  Same  as  Fig.  2»  but  for  component  e  =  2  (feature  B)  of  the 
modeling  fields.  So,  even  considering  both  features  A  and  B  the  agent 
without  language  cannot  identify  all  six  categories. 

V .  Agent  with  language 

Motivated  by  the  “mushroom  world”  scenario  [5],  [6],  we 
assume  that  besides  the  physical  stimuli 
/  =  l,...,iV  the  agent  receives  from  the  environment  an 
additional  “linguistic”  input  Wj  associated  to  each  of  the 
.V  =  600  examples  depicted  in  Fig.  1.  In  practice,  this 
amounts  to  assume  the  presence  of  an  external  teacher  who, 
while  pointing  to  an  example  {0]j,02i),  utters  the  word 
W-  .  Of  course,  the  teacher  utters  the  same  word  for  all 
examples  belonging  to  a  same  category.  Hence,  PF- , 
/  =  1, . . . ,  iV  ,  takes  on  only  six  different  values  (i.e.,  there  are 
only  six  different  words).  The  nature  of  the  signals  IV ^ 
(i.e.,  the  words)  is  completely  distinct  from  that  of  the  inputs 
(0  ,  ,02/).  To  take  this  into  account  we  assume  that  the 
words  take  on  the  integer  values  1,2,... ,6  ,  rather  than 
real  values  as  the  modeling  fields  associated  to  the  physical 
features  A  and  B.  These  are  the  category  labels  exhibited  in 
Fig.  1. 

From  the  mathematical  aspect,  inclusion  of  the  additional, 
linguistic  component  to  characterize  the  examples  does  not 
alter  in  any  essential  way  the  basic  equations  of  the  field 
dynamics,  Eqs.  (5)  -  (7).  In  particular,  the  inputs  are  now 
described  by  the  triples  {0^,021, ^i)  /  =  1,...,^V  which 
should  be  matched  by  the  three-component  modeling  fields 
(‘^u » *^2* » *^3* )  ^  =  1,  *  *  • ,  A/  .  Hence,  the  form  of  the  field 
equations  is  unaltered,  and  the  addition  of  a  third  component 
is  taken  into  account  by  letting  the  index  e  run  from  1  to  3. 
The  parameters  for  the  linguistic  component  e  =  3  are 

=1x1 0“^ ,  cr^3  =  3  and  cr^3  =0.1.  The  reason  for  the 
larger  value  of  0*^3 ,  as  compared  with  the  values  of  cT^i 
and  cr^2  »  separation  between  the  target  words  are 

greater  than  the  distance  between  the  mean  values  of  the 


gaussian  distributions  used  to  generate  the  categor>' 
examples  in  Fig.  1.  We  note  that  for  the  successful 
convergence  of  the  MFT  scheme  one  should  always  start 
with  large  fuzziness  to  guarantee  that  at  the  outset  all  models 
have  a  nonzero  similarity  with  all  input  data  [9].  Moreover, 
the  small  magnitude  of  as  compared  with  a,  and 

a2  emphasizes  the  need  for  different  learning  times  for 
assimilation  of  inputs  of  distinct  nature.  Here  we  let  the 
linguistic  component  evolve  much  slower  than  the  non- 
linguistic  ones.  Because  of  this  slower  rate,  the  time  scale 
for  convergence  of  the  dynamics  is  increased  by  a  factor  5 
as  seen  in  the  next  four  figures. 

In  Figs.  4  to  7  we  show  the  time  evolution  of  the  three 
components  of  the  modeling  fields  in  the  case  the  linguistic 
input  is  considered. 
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Fig.  4.  Evolution  of  the  linguistic  component  (  ^  =  3  )  of  the  modeling 
fields  whose  initial  values  were  chosen  randomly  among  the  integers 
1,2,. ..,6. 
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Fig.  5.  Evolution  of  the  component  e  =  1  (feature  A)  of  the  six  modeling 
fields  for  the  agent  with  language.  Though  barely  visible  in  this  scale  the 
agent  identifies  the  six  distinct  objects  or  categories  (see  Fig.  6). 
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Fig.  6.  A  closer  vieu  of  the  modeling  fields  associated  to  the  two 
overlapping  objects  displayed  in  Fig.  5  confiims  that  the  agent  indeed 
discriminates  feature  A. 
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Fig.  7.  Evolution  of  the  component  ^  =  2  (feature  B)  of  the  six  modeling 
fields  for  the  agent  with  language.  Fhe  distinction  between  the  two 
overlapping  objects  is  more  perceptible  for  this  component. 

The  one-to-one  correspondence  between  the  input-words 
and  the  component  e  =  3  of  the  modeling  fields  is  easily 
achieved  as  shown  in  Fig.  4.  As  soon  as  the  agent 

assimilates  the  fact  that  words  3  and  4  are  different,  which 
happens  at  /  =  0.3  approximately,  the  two  overlapping 
objects  are  differentiated,  as  illustrated  in  Figs.  5,  6,  and  7. 
This  IS  a  remarkable  finding:  the  extra  information  carried 
by  the  linguistic  component  allowed  the  agent  to  create 
distinct  non-linguistic  representations  for  the  category 
examples.  In  other  words,  the  knowledgement  that  the 
categories  3  and  4  are  distinct,  that  was  achived  through  the 
linguistic  input,  allowed  the  agent  to  redefine  and  refine  its 
expectations  about  features  A  and  B.  We  wonder  whether 
the  agent  would  create  ficticious  distinctions  between  those 
features  in  the  case  their  distributions  were  identical.  As 
already  said,  what  is  behind  this  result  is  the  inter¬ 
dependence  of  the  features  introduced  in  the  field  equations 
by  the  fuzzy  association  variables,  so  that  the  learning  of 
one  of  the  features  affects  the  learning  of  the  others  Finally, 


we  note  that  the  asymptotic  values  of  the  modeling  fields 
illustrated  in  these  figures  do  not  match  exactly  the  means  of 
the  Gaussian  distributions  used  to  generate  the  data  of  Fig. 

1 ,  but  they  are  close  enough  to  them  to  identify 
unambiguously  the  six  categories.  We  have  verified  that  the 
matching  improves  when  the  number  of  examples  of  each 
category  increases. 


VI.  Conclusion 

We  have  reported  a  computational  experiment  in  which  the 
addition  of  language,  or  more  precisely  of  a  linguistic  signal, 
affects  the  manner  that  an  agent  processes  its  other  sensory 
inputs.  Remarkably,  the  agent  with  language  is  capable  of 
differentiating  categories  that  it  could  not  distinguish 
without  language.  We  note  that  what  distinguishes  linguistic 
signals  (e.g.,  word  sounds)  from  other  stimuli  is  that  the 
agent  experiences  the  sounds  in  concomitance  with  non- 
linguistic  experience.  The  crucial  role  played  by  the 
linguistic  signal  in  our  experiment  contrasts  with  the  more 
mildly  claim  that  language  enhances  performance  only  if  the 
agent  has  already  evolved  an  ability  to  respond  appropriately 
to  the  visually  perceived  objects  without  language  [5]. 

In  our  scenario,  the  agent  develops  only  the  capacity  to 
“understand”  the  words  uttered  by  the  external  teacher;  the 
production  of  words  was  not  considered  as  it  must 
necessarily  involve  at  least  two  agents  (see  below).  The 
agent  “understands”  the  meaning  of  a  word  when  it 
associates  (i.e.,  grounds)  that  word  stimulus  with  a  concrete 
object  or  category  example  in  the  environment  [18].  This 
type  of  association  is  made  very  simple  in  the  MFT 
framework.  Since  in  the  experiment  reported  here  it  is 
assumed  the  presence  of  an  external  teacher  with  complete 
knowledge  of  the  agent’s  environment  our  results  are 
relevant  to  the  issue  of  language  acquisition  by  children 
(see,  e.g.,  [19])  rather  than  to  the  language  evolution 

problem. 

As  pointed  above,  the  study  of  the  emergence  of  the  ability 
to  produce  linguistic  signals  requires  the  use  of  two  or  more 
agents.  There  are  two  obstacles  to  adapt  our  discrimination 
task  scenario  to  the  multi-agent  situation,  and  so  replace  the 
external  teacher  by  the  agents  themselves.  First,  we  need  to 
assume  that  one  agent  (the  speaker)  can  somehow 
distinguish  between  the  overlapping  categories  while  the 
other  agent  (the  hearer)  cannot.  This  type  of  unwarranted 
assumption  is  made  in  the  mushroom  world  scenario  [5],  [6]. 
A  less  far-fetched  possibility  is  to  assume  that  the  agents  can 
perceive  different  features  of  the  examples.  So  it  is  plausible 
to  admit  that  examples  put  into  a  same  category  by  one 
agent  are  perceived  as  completely  distinct  by  another  agent, 
because  they  process  different  features  of  their  environment. 
Second,  the  MFT  framework  suits  well  to  aprehend 
characteristics  of  the  environment  that  are  produced  by  a 
well  defined  process  that  may  or  may  not  be  corrupted  by 
noise.  However,  the  mechanism  that  leads  two  agents  to 
reach  a  consensus  on  which  word  to  assign  to  a  given 
category  is  not  described  by  such  a  process  but  by  a  series  of 


guessing  or  naming  games  (see,  e.g.,  [20],  [21])  in  which 
similarity  measures  seem  to  play  no  role  at  all.  In  fact, 
consider  the  case  where  two  agents  assign  different  words  to 
the  same  category  and  each  agent  broadcasts  its  linguistic 
signal  to  the  other.  There  is  no  reason  for  an  agent  to  give  up 
its  word  in  favor  of  that  used  by  the  other  agent  (as  actually 
happens  in  the  case  of  an  external  teacher  considered  here) 
and  so  no  consensus  will  ever  be  reached  in  this  case. 
Hence,  whereas  the  present  framework  looks  very  promising 
to  model  acquisition  of  language  in  children,  we  do  not  see 
how  it  could  be  applied  to  the  formation  of  a  common 
lexicon  in  a  community  of  agents.  We  refer  the  reader  to 
Ref  [22]  for  a  framework  that  uses  MFT  to  categorize 
examples  and  the  guessing  games  strategy  to  name  the  just 
created  categories. 
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Abstract —  In  this  paper  we  present  some  recent  cognitive  robotics  studies  on  language  and  cognition  integration  to  demonstrate 
how  the  language  acquired  by  robotic  agents  can  be  directly  grounded  in  action  representations.  These  studies  are  characterized 
by  the  hypothesis  that  symbols  are  directly  grounded  into  the  agents'  own  categorical  representations,  whilst  at  the  same  time 
having  logical  (e.g.  syntactic)  relationships  with  other  symbols.  The  two  robotics  studies  are  based  on  the  combination  of  cognitive 
robotics  with  neural  modeling  methodologies  such  as  connectionist  models  and  modeling  field  theory.  Simulations  demonstrate  the 
efficacy  of  the  mechanisms  of  action  grounding  of  language  and  the  symbol  grounding  transfer  in  agents  that  acquire  a  lexicon  via 
imitation  and  linguistic  instructions.  The  paper  also  discusses  the  scientific  and  technological  implications  of  such  an  approach. 

I.  Introduction 


Recent  advances  in  cognitive  psychology,  neuroscience  and  linguistics  support  an  embodied  view  of 
cognition,  i.e.  the  fact  that  cognitive  functions  (perception,  categorization,  reasoning,  language)  are 
strictly  intertwined  with  sensorimotor  and  emotional  processes  (Wilson  2002).  This  is  particularly 
evident  in  recent  studies  on  the  grounding  of  language  in  action  and  perception  (Pecher  &  Zwann  2004). 
For  example,  in  psycholinguistics,  Glenberg  &  Kaschak  2002  have  demonstrated  the  existence  of 
Action-sentence  Compatibility  Effects.  In  sentence  comprehension  tasks,  participants  are  faster  to  judge 
the  sensibility  of  sentences  implying  motion  toward  the  body  (e.g.  ‘‘Courtney  gave  you  the  notebook”) 
when  the  response  requires  moving  toward  the  body  (i.e.  press  a  button  nearer  body).  When  the  sentence 
implied  movement  away  from  the  body,  participants  were  faster  to  respond  by  literally  moving  away 
from  their  bodies  (press  a  button  farther  from  body).  The  data  support  an  embodied  theory  of  meaning 
that  relates  the  meaning  of  sentences  to  human  action  and  motor  affordances.  This  is  also  consistent  with 
neuroscientific  studies  on  action  and  language,  such  as  the  involvement  of  the  mirror  neuron  system  for 
action  and  language  learning  (Rizzolatti  &  Arbib  1998)  and  brain  imaging  studies  where  words  (e.g. 
action  verbs)  activate  cortical  areas  (e.g.  motor  and  premotor  cortex)  in  a  somatotopic  fashion 
(Pulvermuller  1993).  In  linguistics,  the  link  between  the  properties  of  language  and  their  relationship 
with  cognitive  processes  has  been  formalized  by  cognitive  and  constructivist  linguistic  theories  (e.g. 
Talmy,  1980). 

This  growing  empirical  evidence  is  consistent  with  recent  advances  in  artificial  intelligence  and 
robotics,  where  the  design  of  the  capabilities  of  the  artificial  cognitive  agents  is  based  on  an  integrated 
cognitive  approach  (Perlovsky,  this  volume).  For  example,  the  design  of  the  linguistic  capabilities  of 
interactive  systems  for  human-robot  communication  are  built  (grounded)  onto  the  robot’s  other 
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sensorimotor  and  cognitive  skills  (Cangelosi  et  al.  2005;  Feldman  &  Narayanan  2004).  Robots  acquire 
words  through  direct  interaction  with  their  physical  and  social  world,  so  that  linguistic  symbols  do  not 
exist  as  arbitrary  representations  of  some  notion,  but  are  intrinsically  connected  to  behavioral  or 
cognitive  abilities,  based  on  the  properties  of  the  reference  system  they  belong  to.  This  task  of 
connecting  the  arbitrary  symbols  used  in  internal  reasoning  with  external  physical  stimuli  is  known  as 
Symbol  Grounding  (Hamad  1990). 

In  this  paper  we  will  present  some  recent  cognitive  robotics  studies  on  language  and  cognition 
integration  to  demonstrate  how  the  language  acquired  by  robotic  agents  can  be  directly  grounded  in 
action  representations.  These  studies  are  characterized  by  the  hypothesis  that  symbols  are  directly 
grounded  into  the  agents’  own  categorical  representations,  whilst  at  the  same  time  having  logical  (e.g. 
syntactic)  relationships  with  other  symbols.  First,  each  symbol  is  directly  grounded  into  internal 
categorical  representations.  These  representations  include  perceptual  categories  (e.g.  the  concept  of  blue 
color,  square  shape,  and  male  face),  sensorimotor  categories  (e.g.  the  action  concept  of  grasping, 
pushing,  and  carrying),  social  representations  (e.g.  individuals,  groups  and  relationships)  and  other 
categorizations  of  the  agent’s  own  internal  states  (e.g.  emotions  and  motivations).  These  categories  are 
connected  to  the  external  world  through  our  perceptual,  motor  and  cognitive  interactions  with  the 
environment.  Second,  symbols  also  have  logical  (e.g.  syntactic)  relationships  with  the  other  symbols  of 
the  lexicons  used  for  communication.  This  allows  symbols  to  be  combined,  using  compositional  rules 
such  as  grammar,  to  form  new  meanings.  For  example,  the  combination  of  the  two  symbols  “stripes” 
and  “horse”,  which  are  directly  grounded  into  the  agent’s  own  sensorimotor  experience  of  striped 
objects  and  horses  in  its  environment,  produces  the  new  concept  (and  word)  “zebra”.  This  new  symbol 
becomes  indirectly  grounded  in  the  agents’  experience  of  the  world  through  the  process  of  “symbol 
grounding  transfer”.  An  example  of  symbol  grounding  transfer  will  be  demonstrated  in  the  cognitive 
robotics  model  for  the  acquisition  and  combination  of  names  of  actions. 

The  two  cognitive  robotics  models  presented  below  will  demonstrate  the  mechanisms  of  action 
grounding  of  language  and  the  symbol  grounding  transfer  in  agents  that  acquire  a  lexicon  via  imitation 
and  linguistic  instructions.  These  models  are  based  on  the  combination  of  cognitive  robotics  with  neural 
modeling  methodologies  such  as  connectionist  models  and  modeling  field  theory. 

II.  CoGNiTivt;  Robotics  and  Connectionist  modelling  of  Symbol  Grounding  transfer 

Neural  networks  have  been  proposed  as  an  ideal  cognitive  modeling  methodology  to  deal  with  the 
symbol  grounding  problem  (Hamad  1990).  For  example,  connectionist  models,  such  as  multi-layer 
perceptrons  (MLP),  permit  a  good  implementation  of  the  process  of  grounding  output  symbolic 
representations  in  the  (analogical)  input  representation  of  external  stimuli  (Plunkett  et  al.  1992; 
Cangelosi  2005).  The  same  feedforward  models  can  be  extended  to  simulate  the  process  of  grounding 
transfer  (Cangelosi  et  al.  2000).  More  recently,  these  connectionist  models  have  been  incorporated  in 
studies  based  on  cognitive  agents  and  robots.  Cognitive  robotics  refers  to  the  field  of  robotics  that  aims 
at  builds  autonomous  cognitive  systems  capable  of  performing  cognitive  tasks  such  as  perception, 
categorization,  language  and  sensorimotor  problems.  Cognitive  robotics  approaches  include  epigenetic 
robotics  and  autonomous  mental  development  systems  (Weng  et  al.  2001),  as  well  as  evolutionary 
robotics  (Nolfi  &  Floreano  2000).  Here  we  briefly  present  a  cognitive  robotics  model  for  the  acquisition 
of  a  lexicon  of  words  of  action  and  for  the  grounding  transfer.  This  is  an  extension  of  the  first  cognitive 
robotics  model  for  symbol  grounding  in  language  comprehension  tasks  originally  developed  by 
Cangelosi  and  Riga  (2006).  The  new  model  presented  below  extends  the  previous  study  by  considering 
both  linguistic  comprehension  and  production  capabilities. 


A.  The  Robot 

The  robotics  model  consists  of  two  simulated  agents  (teacher  and  learner)  embedded  within  a  virtual 
simulated  environment  (Fig.  1).  Each  robot  consists  of  two  3-segment  arms  attached  to  a  torso  (6 
Degrees  of  Freedom).  This  is  further  connected  to  a  base  with  four  wheels,  which  were  not  used  in  the 
present  simulation.  Through  the  two  arms  the  robot  can  interact  with  the  environment  and  manipulate 
objects  placed  in  front  of  it.  Three  objects  were  used  in  the  current  simulation:  a  cube,  a  horizontal  plane 
and  a  vertical  bar.  The  agent  can  receive  in  the  input  retina  different  views  (perspectives)  of  each  object. 
The  agent  has  to  learn  six  basic  actions:  lower  right  shoulder,  lower  left  shoulder,  close  right  upperarm, 
close  left  upperarm,  close  right  elbow,  close  left  elbow.  They  will  also  learn  the  name  of  such  basic 
actions:  “Lower_Right_Shoulder”,  “Lower_Left_Shoulder”,  “Close_Right_Upperarm”, 
“CLOSE_LEFT_UpPERARM”,  “CL0SE_R]GHT_ELB0W”,  “CLOSE_LeFT_ELBOW”  Each  action  will  be 
associated  with  some  of  the  above  objects  that  are  put  in  front  of  the  agent.  The  close  left  and  close  right 
shoulder  actions  are  associated  with  different  views  of  the  cube. 


Fig.  1 :  Simulation  setup  with  the  two  robots.  The  teacher  robot  is  on  the  left  and  the  learner  on  the  right. 
The  agents  are  performing  the  close  left  elbow  action. 

This  system  is  implemented  using  ODE  (Open  Dynamics  Engine,  www.ode.org),  an  open  source,  high 
performance  library  for  simulating  rigid  body  dynamics.  ODE  is  useful  for  simulating  vehicles,  objects 
in  virtual  reality  environments  and  virtual  creatures,  and  it  is  being  increasingly  used  for  simulation 
studies  on  autonomous  cognitive  systems. 

The  first  agent,  the  teacher,  is  pre-programmed  to  perform  and  demonstrate  a  variety  of  basic  actions, 
each  associated  with  a  linguistic  signal.  These  are  demonstrated  to  the  second  robot,  the  learner,  which 
attempts  to  reproduce  the  actions  by  mimicking  them.  First  the  agent  acquires  basic  actions  by  observing 
the  teacher,  and  then  it  learns  the  basic  action  names  (direct  grounding).  Subsequently,  it  autonomously 
uses  the  linguistic  symbols  that  were  grounded  in  the  previous  learning  stage  to  acquire  new  higher- 
order  actions  (symbol  grounding  transfer). 

B  Neural  network  controller  and  training  procedure 

The  imitator  robot  is  endowed  with  a  MLP  neural  network  (Fig.  2)  with  input  units  for  vision, 
proprioceptive  and  linguistic  input  and  output  units  for  motor  control  and  linguistic  output.  For  the  robot 
motor  control,  the  motor  output  units  encode  the  force  that  is  being  applied  on  each  Joint.  Each  action 


consists  of  a  sequence  of  10  steps  of  motor  activations. 
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Fig  2.  Architecture  of  the  learner  robot’s  neural  network  controller. 

We  attain  the  grounding  transfer,  using  a  3  stage  training  process:  (1)  BA  Basic  Action  learning,  (2) 
EL  Entry-Level  naming  and  (3)  HL  Higher-Order  learning. 

During  the  Basic  Action  learning  stage,  the  agent  learns  to  execute  all  six  basic  actions  in  association 
with  the  view  of  the  different  objects.  No  linguistic  elements  are  used  at  this  stage.  The  imitation 
algorithm  is  used  to  adjust  the  weights  contributing  to  the  activation  of  the  motor  units  using  supervised 
learning  (see  Cangelosi  et  al.  2006  for  the  learning  algorithm  details). 

The  second  learning  stage.  Entry  Level  naming  (EL),  was  concerned  with  associating  the  previously 
acquired  behaviors  to  linguistic  signals.  It  features  three  sequential  activation  cycles.  The  first  EL  cycle. 
Linguistic  Production,  trains  the  learner  how  to  name  the  6  basic  actions.  Motor  (proprioceptive)  and 
visual  (object  view)  information  are  given  in  input  to  the  network.  The  agents  learn  to  correctly  activate 
the  output  linguistic  nodes  corresponding  to  the  basic  action  names.  This  is  based  on  a  standard 
backpropagation  algorithm.  This  linguistic  production  cycle  implements  the  process  of  basic  symbol 
grounding,  by  which  the  names  (s3mibols)  that  the  agent  is  learning  are  directly  grounded  on  its  own 
perceptual  and  sensorimotor  experience.  In  the  second  EL  cycle.  Linguistic  Comprehension,  learner 
agents  are  taught  to  correctly  respond  to  a  linguistic  signal  consisting  of  the  name  of  the  action,  without 
having  the  ability  to  perceive  the  object  associated  to  the  action.  To  accomplish  this,  the  retinal  units  in 
the  network  were  set  to  0,  whilst  we  activate  the  input  units  corresponding  to  the  action  name.  In  the 
final  EL  cycle,  Imitation,  both  motor  and  linguistic  inputs  were  activated  in  input,  and  the  network 
learns  to  reproduce  the  action  in  output  and  activate  the  corresponding  action  name  unit.  This  third  cycle 
is  necessary  to  permit  the  linking  of  the  production  and  the  comprehension  tasks  in  the  hidden  units 
activation  pattern  (Cangelosi  et  al.  2000). 

The  final  training  stage,  Higher-Level  (HL)  learning,  allows  the  learner  agents  to  autonomously 
acquire  higher-order  actions  without  the  need  of  a  demonstration  from  the  teacher.  This  is  achieved  only 
through  a  linguistic  instruction  strategy  and  a  ‘‘mental  simulation”  strategy  similar  to  Barsalou’s 
perceptual  symbol  system  hypothesis  (Barsalou  1999).  The  teacher  has  only  to  provide  new  linguistic 
instructions  consisting  of  the  names  of  two  basic  actions  and  the  name  of  a  new  higher-order  action.  For 
example,  the  three  higher-order  actions,  “LowER__RlGHT__SHOULDER'fLowER_LEFT_SHOULDER=PLACE”. 


Once  the  teacher  (or  a  human  instructor)  provides  a  higher-order  instruction,  the  learner  goes  through 
four  HL  learning  cycles.  First  it  activates  only  the  input  unit  of  the  first  basic  action  name  to  produce 
and  store  (“memorize”)  the  corresponding  sequence  of  10  motor  activation  steps.  Second,  it  activates  in 
input  the  linguistic  units  for  the  first  basic  action  name  and  the  new  higher-order  action.  The  resulting  10 
motor  activations  are  compared  with  the  previously  stored  values  to  calculate  the  error  and  apply  the 
backpropagation  weight  corrections.  The  next  two  cycles  are  the  same  as  the  first  two,  except  that  the 
second  basic  action  name  unit  is  activated  as  well. 

The  Higher-Order  stage  permits  the  implementation  of  a  purely  autonomous  way  to  acquire  new 
actions  through  the  linguistic  combination  of  previously-learned  basic  action  names.  The  role  of  the 
teacher  in  this  stage  is  only  that  of  providing  a  linguistic  instruction,  without  the  need  to  give  a 
perceptual  demonstration  of  the  new  action.  The  motor  imitation  learning,  such  as  in  the  Basic  Action 
training  stage,  is  a  slow  process  based  on  continuous  supervision,  trial-and-error  feedback-based 
learning.  The  acquisition  of  a  new  concept  through  linguistic  instruction  is,  instead,  a  quicker  learning 
mechanism  because  it  requires  the  contribution  of  fewer  units  (the  localist  linguistic  units)  and 
corresponding  weights.  Moreover,  in  a  related  symbol  grounding  model  on  language  (symbolic)  vs. 
error-and-trial  (sensorimotor  toil)  learning  of  categories,  the  linguistic  procedure  consistently 
outperforms  the  other  learning  method  (Cangelosi  &  Hamad  2000). 

To  establish  if  the  agent  has  actually  learned  the  new  high-order  actions  and  transferred  the  grounding 
from  basic  action  names  to  higher  order  names,  a  test  phase  is  performed.  This  grounding  transfer  test 
aims  at  evaluating  the  aptitude  of  the  imitator  agent  to  perform  a  new  composite  action  with  any  of  the 
objects  previously  associated,  in  the  absence  of  the  linguistic  descriptions  of  the  basic  actions.  Thus  the 
agent  is  requested  to  respond  solely  on  the  signal  of  the  composite  action  (e.g.  Grab)  and  selectively  to 
the  different  view  of  the  objects.  In  addition,  while  the  imitator  was  taught  only  the  motion  of  the 
dissected  action  for  each  composite  behavior,  the  test  evaluated  the  performance  of  the  higher-order 
composite  action.  This  was  a  behavior  never  seen  before  by  the  robot.  The  stage  was  comprised  of  two 
basic  trials  per  behavior,  using  the  different  view  of  the  objects.  All  inputs  were  propagated  through  the 
network  with  no  training  occurring. 

C.  Results 

We  replicated  the  simulation  experiment  as  above  with  five  agents.  Each  agent  had  a  different  set  of 
random  weights  initialized  in  the  range  ±1.  The  three  learning  stage,  Basic  Action,  Entry-Level  and 
Higher-Level  learning,  respectively  lasted  for  1000,  3000,  1500  epochs.  This  was  the  approximate 
minimum  number  of  epochs  necessary  to  reach  a  good  learning  performance.  The  parameters  of  the 
backpropagation  algorithm  were  set  as  follow:  BA  stage,  momentum  0.6  and  learning  rate  0.2;  EL  stage, 
momentum  0.6  and  learning  rate  0.3;  HL  stage,  momentum  0.8  and  learning  rate  0.2  .  The  weights 
were  updated  at  the  end  of  every  action. 

Overall,  results  indicate  that  all  agents  were  able  to  learn  successfully  the  6  basic  actions  and  the  3 
higher-order  behaviors.  At  the  end  of  the  stage,  the  imitator  was  able  to  execute  all  actions  flawlessly, 
when  presented  with  an  object  (final  error  of  0.004).  The  overall  average  error  on  the  final  epoch  of  the 
Entry-Level  stage  was  0.03.  Finally,  in  the  grounding  transfer  test  the  agent  was  requested  to  perform  a 
new  composite  action  by  giving  in  input  only  the  new  action  name  or  the  new  name  together  with  the 
basic  action  names  (error  0.018).  These  results  confirm  our  hypothesis  that  previously  grounded 
symbols  are  transferred  to  the  new  behaviors. 

111.  Action  and  Lexicon  Scaling  up  with  Modeling  Field  Theory 

In  this  study  we  aim  at  extending  the  behavioral  and  linguistic  capabilities  of  the  robot  by  scaling  up  its 
action  repertoire.  Perlovsky  (2001;  this  volume)  has  recently  proposed  the  use  of  the  Modeling  Field 


Theory  (MFT)  learning  algorithm  to  deal  with  the  issue  of  the  combinatorial  complexity  (CC)  of 
linguistic  and  cognitive  modeling  based  on  machine  learning  techniques  such  as  multi-layer  perceptrons. 
The  Modeling  Field  Theory  (MFT)  algorithm  uses  dynamic  logic  to  avoid  CC  and  computes  similarity 
measures  between  internal  concept-models  and  the  perceptual  and  linguistic  signals.  By  using  concept- 
models  with  multiple  sensorimotor  modalities,  a  MFT  system  can  integrate  language-specific  signals 
with  other  internal  cognitive  representations.  Perlovsky’s  proposal  to  apply  MFT  in  the  language 
domain  is  highly  consistent  with  the  grounded  approach  to  language  modeling  discussed  above.  That  is, 
both  accounts  are  based  on  the  strict  integration  of  language  and  cognition.  This  permits  the  design  of 
cognitive  systems  that  are  truly  able  to  “understand”  the  meaning  of  words  being  used  by  autonomously 
linking  the  linguistic  signals  to  the  internal  concept-models  of  the  word  constructed  during  the 
sensorimotor  interaction  with  the  environment.  The  combination  of  MFT  systems  with  grounded  agent 
simulations  will  permit  the  overcoming  of  the  CC  problems  currently  faced  in  grounded  agent  models 
and  scale  up  the  lexicons  in  terms  of  high  number  of  lexical  entries  and  syntactic  categories. 

Modeling  Field  Theory  is  based  on  the  principle  of  associating  lower-level  signals  (e.g.,  inputs, 
bottom-up  signals)  with  higher-level  concept-models  (e.g.  internal  representations,  categories/concepts, 
top-down  signals)  avoiding  the  combinatorial  complexity  inherent  to  such  a  task.  This  is  achieved  by 
using  measures  of  similarity  between  concept-models  and  input  signals  together  with  a  new  type  of 
logic,  so-called  dynamic  logic.  MFT  may  be  viewed  as  an  unsupervised  learning  algorithm  whereby  a 
series  of  concept-models  adapt  to  the  features  of  the  input  stimuli  via  gradual  adjustment  dependent  on 
the  fuzzy  similarity  measures. 

A.  Extended  action  and  lexicon  repertoire 

The  robotic  scenario  is  based  on  the  same  simulated  robotic  agents  described  in  the  previous  section 
(see  Fig.  1).  The  teacher  robot  is  pre-programmed  to  demonstrate  an  extended  action  repertoire  of  1 12 
actions.  The  learner  robot  uses  MFT  to  learn  to  reproduce  those  actions  as  well  as  to  learn  the  actions 
names. 

The  main  difference  with  respect  to  the  previous  model  is  the  use  of  1 12  different  actions.  These  are 
inspired  by  the  semaphore  flag  signaling  alphabet.  For  the  encoding  of  the  actions,  we  collected  data  on 
the  posture  of  the  teacher  robots  using  6  features,  i.e.  3  pairs  of  angles  for  the  two  joints  of  the  shoulder, 
upper  arm  and  elbow  joints.  In  this  simulation,  objects  are  not  present. 

When  performing  the  action,  the  teacher  agent  can  emit  a  three-letter  word  labeling  the  action.  Each 
label  consists  of  a  Consonant-Vowel-Consonant  word,  such  as  “XUM”,  “HAW”,  “RIV”.  All 
consonants  and  letters  of  the  English  alphabet  are  used.  Each  letter  is  encoded  using  two  real-value 
features  in  the  interval  [0,1].  Therefore  each  action  word  is  represented  by  6  features.  Each  word  is 
unique  to  the  action  performed. 

B.  MFT  algorithm 

We  use  a  multi-dimensional  MFT  algorithm  (Tikhanoflf  et  al.  2006)  with  1 12  input  fields  (concept- 
models)  randomly  initialized.  We  consider  the  action  learning  problem  as  that  of  categorizing  A=112 
objects  (actions)  /=!,.. .,jV,  each  of  which  is  characterized  by  d=\2  features  e=\,...,d.  These  features  are 
represented  by  real  numbers  C>,e  e  (0,1)  -  the  input  signals.  These  12  features  correspond  to  the  6  joint 
rotation  angles  and  6  phonetic  encoding  values.  Moreover,  we  assume  that  there  are  M=  \  12  d- 
dimensional  fields  (i.e.  concept-models  of  the  prototype  of  actions/words  to  be  learned)  k=\,...,M 
described  by  real-valued  fields  Ske,  with  e=^\,...,d  as  before.  The  concept  models  will  tend  to  match  the 
input  object  features  (9,>  during  learning  by  maximizing  the  global  similarity  function 
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where 


/(/ U)  =  n  (2;rtT^ )  "  exp [- (5^  -  O  J 72 a- ] 

is  the  similarity  measure  between  object  /  and  concept  k.  Here  is  the  fuzziness  parameter  that 
gradually  decreases  over  time.  Full  details  on  the  learning  algorithm  can  be  found  in  Tikhanoff  et  al. 
2007.  See  also  Perlovsky  (this  volume)  for  an  overview  of  the  MFT  algorithm. 

C.  Results 

The  simulation  lasts  for  25000  training  steps.  In  the  first  12500  cycles,  only  the  6  action  features 
(angles)  are  provided.  This  is  enough  for  the  agents  to  learn  to  reproduce  the  action  repertoire.  At  cycle 
12500  (half  of  the  training  time),  all  12  feature  sets  (6  for  actions/angles,  6  for  phonetic  sounds)  are 
considered  when  computing  the  MFT  fuzzy  similarity  functions.  The  re-initialization  of  the  fuzziness 
parameter  cr*,  at  timestep  125000  allows  the  agent  to  learn  the  new  sound  features  and  create  a  concept 
model  of  the  labels. 

Results  demonstrate  that  the  robot  is  able  to  categorize  95%  of  actions  and  learn  their  unique  labels. 
Figure  3  shows  the  evolution  of  the  112  concept-model  fields  during  training.  Note  the  resetting  of  the 
fields  at  timestep  12500,  when  words  are  introduced  and  the  fuzziness  <t*,  is  reinitialized. 
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Figure  3  -  Evolution  of  1 12  concept-models  during  training.  Vertical  axis  represents  a  compressed  one-dimensional 
representation  of  the  concept-models  using  the  amplitude  S*  =  iS 

IV.  Discussion  and  Conclusions 

The  simulation  experiments  above  concern  the  study  of  the  language  grounding  in  action  and  the 
symbol  grounding  transfer  in  cognitive  robotic  agents.  The  positive  results  of  the  grounding  transfer 
simulation  demonstrate  that  it  is  possible  to  design  autonomous  linguistic  agents  capable  of  acquiring 
new  grounded  concepts. 

The  use  of  MFT  to  overcome  the  CC  limitations  of  connectionist  models  demonstrates  that  it  is 
possible  to  scale  up  the  action  and  lexicon  repertoire  of  the  cognitive  robotic  agents.  Perlovsky’s  (2004) 


proposal  to  apply  MFT  in  the  language  domain  is  highly  consistent  with  the  grounded  approach  to 
language  modeling  discussed  above.  That  is,  both  accounts  are  based  on  the  strict  integration  of 
language  and  cognition.  This  permits  the  design  of  cognitive  systems  that  are  truly  able  to  “understand” 
the  meaning  of  words  being  used  by  autonomously  linking  the  linguistic  signals  to  the  internal  concept- 
models  of  the  word  constructed  during  the  sensorimotor  interaction  with  the  environment. 

The  potential  impact  of  this  grounded  cognitive  robotic  approach  for  the  development  of  intelligent 
systems  is  great,  both  for  cognitive  science  and  for  technology.  In  cognitive  science,  the  area  of 
embodied  cognition  regards  the  study  of  the  functioning  and  organization  of  cognition  in  natural  and 
artificial  systems.  For  example,  the  Higher-Order  learning  procedure  is  inspired  by  Barsalou’s 
“reenactment”  and  “mental  simulation”  mechanism  in  the  perceptual  symbol  system  hypothesis. 
Barsalou  (1997)  demonstrates  that  during  perceptual  experience,  association  areas  in  the  brain  capture 
bottom-up  patterns  of  activation  in  sensory-motor  areas.  Later,  in  a  top-down  manner,  association  areas 
partially  reactivate  sensory-motor  areas  to  implement  perceptual  symbols  simulators.  A  simulation 
platform  like  the  one  used  here  can  be  used  to  test  further  embodied  cognition  theories  of  language,  such 
as  Glenberg  and  Kaschak’s  (2002)  action-compatibility  effects.  In  addition,  such  an  approach  can  be 
used  to  study  the  development  and  emergence  of  language  in  epigenetic  robots  (Weng  et  al.  2001 ;  Metta 
et  al.  2006) 

For  the  technological  implications  of  such  a  project,  the  model  proposed  here  can  be  useful  in  fields 
such  as  that  of  defense  systems,  service  robotics  and  human-robot  interaction.  In  the  area  of  defense 
systems,  cognitive  systems  are  essential  for  integrated  multi-platform  systems  capable  of  sensing  and 
communicating.  Such  robots  can  be  beneficial  in  collaborative  and  distributed  tasks  such  as  multi-agent 
exploration  and  navigation  in  unknown  terrains.  In  service  and  household  robotics,  future  systems  will 
be  able  to  learn  language  and  world  understanding  from  humans,  and  also  to  interact  with  them  for 
entertainment  purposes  (e.g.  Tikhanoff  &  Miranda,  2005;  Steels  &  Kaplan  2000).  In  human-robot 
communication  systems,  robots  will  develop  their  lexicon  through  close  interaction  with  their 
environment  and  whilst  communicating  with  humans.  Such  a  social  learning  context  can  permit  a  more 
efficient  acquisition  of  communication  capabilities  in  autonomous  robots,  as  demonstrated  in  Steels  & 
Kaplan  (2000). 
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